AI Architect Open to collaborations

Building the infrastructure for autonomous intelligence.

I design and ship production-scale AI systems — from high-throughput inference engines to multi-agent orchestration frameworks. Focused on the intersection of systems engineering and frontier AI research.

Currently exploring: Multi-agent reasoning & inference-time compute

GitHub LinkedIn CV

Portrait of Amit Kumar, AI Systems Architect

Systems & Research

What I Build

Core Infrastructure

Marvin

High-performance inference backbone serving heterogeneous LLM deployments. Optimized for maximum throughput with dynamic quantization, speculative decoding, and intelligent KV cache management across diverse model architectures.

#C++ #CUDA #speculative-decoding #INT4/INT8

Agentic AI

Eddie

Multi-agent orchestration framework for autonomous task solving. Features hierarchical reasoning with stateful agent collaboration, robust guardrails, and persistent state management for complex, long-horizon enterprise workflows.

#LangGraph #multi-agent #guardrails #state-machines

Cloud & Scale

Vortex

Kubernetes-native AI platform toolkit for multi-tenant inference clusters. Provides unified request routing, zero-trust service mesh, and automated GPU workload scaling with CLI-driven lifecycle management.

#Kubernetes #GKE #service-mesh #LiteLLM

Research & Writing

Frontier AI Papers

Regular deep-dives into frontier research — from infinite-context architectures and memory-efficient training to RLHF alignment. Each paper reading includes detailed presentations with implementation insights.

#StreamingLLM #Infini-Attention #GaLore #WARP

Healthcare NLP

Clinical Trial Chatbot

Intelligent Q&A system for clinical trial documents featuring named entity recognition for adverse drug events, Haystack-based retrieval, and an LLM-powered dashboard for interactive exploration of medical literature.

#NER #Haystack #LLM #clinical-NLP

Time-Series ML

Drug Consumption Forecasting

Predictive analytics pipeline for pharmaceutical demand planning using ensemble time-series methods. Achieved MAPE under 20% by combining ARIMA, Holt-Winters exponential smoothing, and XGBoost regressors.

#ARIMA #Holt-Winters #XGBoost #forecasting

Computer Vision

Abnormal Event Detection in Video

Real-time anomaly detection system for surveillance video streams, achieving 95% precision on the UCSD Anomaly Detection Dataset. Combines spatial-temporal feature extraction with unsupervised learning.

#CNN #anomaly-detection #video-analysis #95%-precision

Philosophy

"The future of AI isn't just smarter models — it's smarter systems. I believe the most impactful work happens at the intersection of rigorous engineering and frontier research, where ideas are stress-tested against production reality."

Presentations

Research Talks & Deep Dives

2026

Modern Language Model Architectures: From Papers to Practice

Decoding architectural patterns across 19+ LLMs — what actually works and why

Presentation 2025

Fast and Simplex: 2-Simplicial Attention in Triton

Rethinking transformer attention with trilinear forms for better scaling laws under token constraints

Presentation 2025

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Efficient long-context modeling at 64K+ tokens with up to 9x speedup over FlashAttention-2

Presentation 2024

StreamingLLM: Efficient Streaming Language Models with Attention Sinks

Enabling infinite-context inference without KV cache explosion

Presentation 2024

AlphaGeometry: AI for Olympiad-Level Geometry

Neuro-symbolic system combining language models with symbolic deduction

Presentation 2024

Infini-Attention: Infinite Context Transformers

Compressive memory for infinite context length in transformers

Presentation 2024

GaLore: Memory-Efficient LLM Training

Reducing fine-tuning memory footprint via gradient projection

Presentation 2024

WARP: Weight Averaged Rewarded Policies for RLHF

Improving reward stability and alignment in RLHF fine-tuning

Presentation 2021

LARS & Efficient Optimizer Landscapes

Layer-wise adaptive rate scaling and large-batch convergence

Paper Reading

Experience & Education

Journey

Mar 2026 — Present

AI Architect

E42.ai

Architecting enterprise-grade AI systems and defining technical strategy for next-generation autonomous intelligence platforms. Driving cross-functional alignment on AI infrastructure, multi-agent frameworks, and scalable LLM deployment.

#AI-architecture #system-design #technical-strategy #LLM-infrastructure

Jul 2024 — Mar 2026

Lead AI Engineer

E42.ai

Leading architecture and deployment of production-scale generative AI platforms. Directing multi-agent systems research, distributed inference infrastructure, and LLM serving pipelines. Mentoring engineers across the full GenAI stack.

#distributed-systems #autonomous-agents #LLM-serving #system-design

Jul 2023 — Jul 2024

AI NLP Engineer

E42.ai

Developed and deployed NLP models for enterprise AI products. Built conversational AI pipelines, intent recognition systems, and information extraction modules at production scale.

#NLP #conversational-AI #information-extraction #production-ML

Nov 2021 — Jul 2023

NLP Engineer

Circlebase (Clinical NLP)

Joined as intern, promoted to full-time. Built entity recognition pipelines for social determinants of health and drug relations. Designed production NLP services and deployment infrastructure.

#NLP #entity-recognition #production-ML

Sept — Nov 2021

Lead ML Engineer

Omdena — Internet & Jurisdiction Policy Network

Built knowledge graph pipelines and NLP-driven triplet extractors for policy document analysis. Contributed to the Datasphere framework by developing text classification and information extraction tools for internet governance datasets.

#knowledge-graphs #NLP #information-extraction #triplet-extractors

2020 — 2022

M.Tech in ML & Computing

Indian Institute of Space Science and Technology (IIST)

Specialization in statistical learning, computational intelligence, and applied ML.

2016 — 2020

B.Tech in Information Technology

Jabalpur Engineering College

Foundations in algorithms, data structures, and distributed computing systems.

Recognition

Achievements

2018

ACM ICPC Regionalist

Qualified for the Kolkata-Kanpur site contest at UIET, Kanpur. Ranked among top competitive programmers nationally.

2018

CodeChef Rating 1883

Handle: amit_9 — Strong algorithmic problem-solving across data structures and combinatorics.

2018

Gold Medal, Chess — Avahan

First place in Chess at the Avahan intra-college championship, JEC.

2018

Organized "So You Think You Can Code"

Designed and ran a competitive programming contest on HackerEarth with problem setting and evaluation.

Toolbox

Skills & Technologies

Languages

Python C++ C CUDA SQL Bash Triton

ML / AI

PyTorch TensorFlow Hugging Face LangChain LangGraph vLLM GGML DeepSpeed

Infrastructure

Docker Kubernetes GCP (GKE) Git CI/CD Service Mesh Terraform

Data

PostgreSQL SQLite Redis Elasticsearch Vector DBs Kafka