Lead AI Engineer

Building the
infrastructure for
autonomous intelligence.

I design and ship production-scale AI systems — from high-throughput inference engines to multi-agent orchestration frameworks. Focused on the intersection of systems engineering and frontier AI research.

Amit Kumar

What I Build

Core Infrastructure

Marvin

High-performance inference backbone serving heterogeneous LLM deployments. Optimized for maximum throughput with dynamic quantization, speculative decoding, and intelligent KV cache management across diverse model architectures.

C++ CUDA Speculative Decoding INT4/INT8
Agentic AI

Eddie

Multi-agent orchestration framework for autonomous task solving. Features hierarchical reasoning with stateful agent collaboration, robust guardrails, and persistent state management for complex, long-horizon enterprise workflows.

LangGraph Multi-Agent Guardrails State Machines
Cloud & Scale

Vortex

Kubernetes-native AI platform toolkit for multi-tenant inference clusters. Provides unified request routing, zero-trust service mesh, and automated GPU workload scaling with CLI-driven lifecycle management.

Kubernetes GKE Service Mesh LiteLLM
Research & Writing

Frontier AI Papers

Regular deep-dives into frontier research — from infinite-context architectures and memory-efficient training to RLHF alignment. Each paper reading includes detailed presentations with implementation insights.

StreamingLLM Infini-Attention GaLore WARP
Healthcare NLP

Clinical Trial Chatbot

Intelligent Q&A system for clinical trial documents featuring named entity recognition for adverse drug events, Haystack-based retrieval, and an LLM-powered dashboard for interactive exploration of medical literature.

NER Haystack LLM Clinical NLP
Time-Series ML

Drug Consumption Forecasting

Predictive analytics pipeline for pharmaceutical demand planning using ensemble time-series methods. Achieved MAPE under 20% by combining ARIMA, Holt-Winters exponential smoothing, and XGBoost regressors.

ARIMA Holt-Winters XGBoost Forecasting
Computer Vision

Abnormal Event Detection in Video

Real-time anomaly detection system for surveillance video streams, achieving 95% precision on the UCSD Anomaly Detection Dataset. Combines spatial-temporal feature extraction with unsupervised learning.

CNN Anomaly Detection Video Analysis 95% Precision
Philosophy
"The future of AI isn't just smarter models — it's smarter systems. I believe the most impactful work happens at the intersection of rigorous engineering and frontier research, where ideas are stress-tested against production reality."

Research Talks & Deep Dives

2024

StreamingLLM: Efficient Streaming Language Models with Attention Sinks

Enabling infinite-context inference without KV cache explosion

Presentation
2024

AlphaGeometry: AI for Olympiad-Level Geometry

Neuro-symbolic system combining language models with symbolic deduction

Presentation
2024

Infini-Attention: Infinite Context Transformers

Compressive memory for infinite context length in transformers

Presentation
2024

GaLore: Memory-Efficient LLM Training

Reducing fine-tuning memory footprint via gradient projection

Presentation
2024

WARP: Weight Averaged Rewarded Policies for RLHF

Improving reward stability and alignment in RLHF fine-tuning

Presentation
2024

LARS & Efficient Optimizer Landscapes

Layer-wise adaptive rate scaling and large-batch convergence

Paper Reading

Journey

2023 — Present
Lead AI Engineer
e42.ai
Leading architecture and deployment of production-scale generative AI platforms. Directing multi-agent systems research, distributed inference infrastructure, and LLM serving pipelines. Mentoring engineers across the full GenAI stack.
Distributed Systems Autonomous Agents LLM Serving System Design
Nov 2021 — May 2023
NLP Engineer
Circlebase (Clinical NLP)
Joined as intern, promoted to full-time. Built entity recognition pipelines for social determinants of health and drug relations. Designed production NLP services and deployment infrastructure.
NLP Entity Recognition Production ML
Sept — Nov 2021
Lead ML Engineer
Omdena — Internet & Jurisdiction Policy Network
Built knowledge graph pipelines and NLP-driven triplet extractors for policy document analysis. Contributed to the Datasphere framework by developing text classification and information extraction tools for internet governance datasets.
Knowledge Graphs NLP Information Extraction Triplet Extractors
2020 — 2022
M.Tech in ML & Computing
Indian Institute of Space Science and Technology (IIST)
Specialization in statistical learning, computational intelligence, and applied ML.
2016 — 2020
B.Tech in Information Technology
Jabalpur Engineering College
Foundations in algorithms, data structures, and distributed computing systems.

Achievements

2018
ACM ICPC Regionalist
Qualified for the Kolkata-Kanpur site contest at UIET, Kanpur. Ranked among top competitive programmers nationally.
2018
CodeChef Rating 1883
Handle: amit_9 — Strong algorithmic problem-solving across data structures and combinatorics.
2018
Gold Medal, Chess — Avahan
First place in Chess at the Avahan intra-college championship, JEC.
2018
Organized "So You Think You Can Code"
Designed and ran a competitive programming contest on HackerEarth with problem setting and evaluation.

Skills & Technologies

Languages
Core Programming
Python, C++, C, SQL, Bash — deep fluency in systems and scripting for ML infrastructure and backend services.
ML / AI
Frameworks & Models
PyTorch, TensorFlow, Hugging Face, LangChain, LangGraph, vLLM, GGML — from training to production inference.
Infrastructure
Cloud & DevOps
Docker, Kubernetes, GCP (GKE), Git, CI/CD, service meshes — building scalable, reproducible ML platforms.
Data
Storage & Processing
PostgreSQL, SQLite, Redis, Elasticsearch, vector databases — managing structured and unstructured data at scale.