Building the infrastructure for autonomous intelligence.
I design and ship production-scale AI systems — from high-throughput inference engines to multi-agent orchestration frameworks. Focused on the intersection of systems engineering and frontier AI research.
What I Build
Marvin
High-performance inference backbone serving heterogeneous LLM deployments. Optimized for maximum throughput with dynamic quantization, speculative decoding, and intelligent KV cache management across diverse model architectures.
Eddie
Multi-agent orchestration framework for autonomous task solving. Features hierarchical reasoning with stateful agent collaboration, robust guardrails, and persistent state management for complex, long-horizon enterprise workflows.
Vortex
Kubernetes-native AI platform toolkit for multi-tenant inference clusters. Provides unified request routing, zero-trust service mesh, and automated GPU workload scaling with CLI-driven lifecycle management.
Frontier AI Papers
Regular deep-dives into frontier research — from infinite-context architectures and memory-efficient training to RLHF alignment. Each paper reading includes detailed presentations with implementation insights.
Clinical Trial Chatbot
Intelligent Q&A system for clinical trial documents featuring named entity recognition for adverse drug events, Haystack-based retrieval, and an LLM-powered dashboard for interactive exploration of medical literature.
Drug Consumption Forecasting
Predictive analytics pipeline for pharmaceutical demand planning using ensemble time-series methods. Achieved MAPE under 20% by combining ARIMA, Holt-Winters exponential smoothing, and XGBoost regressors.
Abnormal Event Detection in Video
Real-time anomaly detection system for surveillance video streams, achieving 95% precision on the UCSD Anomaly Detection Dataset. Combines spatial-temporal feature extraction with unsupervised learning.
"The future of AI isn't just smarter models — it's smarter systems. I believe the most impactful work happens at the intersection of rigorous engineering and frontier research, where ideas are stress-tested against production reality."
Research Talks & Deep Dives
Modern Language Model Architectures: From Papers to Practice
Decoding architectural patterns across 19+ LLMs — what actually works and why
Fast and Simplex: 2-Simplicial Attention in Triton
Rethinking transformer attention with trilinear forms for better scaling laws under token constraints
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Efficient long-context modeling at 64K+ tokens with up to 9x speedup over FlashAttention-2
StreamingLLM: Efficient Streaming Language Models with Attention Sinks
Enabling infinite-context inference without KV cache explosion
AlphaGeometry: AI for Olympiad-Level Geometry
Neuro-symbolic system combining language models with symbolic deduction
Infini-Attention: Infinite Context Transformers
Compressive memory for infinite context length in transformers
GaLore: Memory-Efficient LLM Training
Reducing fine-tuning memory footprint via gradient projection
WARP: Weight Averaged Rewarded Policies for RLHF
Improving reward stability and alignment in RLHF fine-tuning
LARS & Efficient Optimizer Landscapes
Layer-wise adaptive rate scaling and large-batch convergence