Aditya Dewan
About
Quantitative finance and machine learning researcher focused on statistical modeling of financial markets, efficient transformer architectures, and loss-landscape-based knowledge distillation. Aspiring deep learning researcher working towards mastery in the field.
Education
Clubs: Quant Club (Goldman Sachs Quantathon: 100%), CMU Racing (built CUDA SVMs with 2x speedup for GPU midline generation).
Systems Work: Built Malloc (dynamic memory allocator) and a fully concurrent, thread-safe file system in C from scratch.
Selected Coursework
Research & Experience
Research-heavy ML work and production systems engineering across long-context modeling, diagnostics, and high-reliability infrastructure.
- Designed a Poisson-process packet-loss probe scheduler with provable guarantees, modeling inter-arrivals and anomaly recurrences to achieve at least 95% detection within seven recurrences; bounded heavy-tail wait behavior via Chernoff-bound analysis and Lambert W-based calibration.
- Built a real-time C#/.NET diagnostics client (STA/MTA multithreading, async probes, rotating ndjson sessions) streaming 51 live metrics with embedded Gaussian fitting, pairwise correlation heatmaps, scatter/histograms, and session replay; reduced production troubleshooting time by roughly 50% for clients including Netflix, the U.S. Army, and Volkswagen.
- Architected an end-to-end time-series ML pipeline that injects CPU/GPU/RTT faults, dynamically labels causal signals, and fine-tunes a Tiny Time Mixer with custom anomaly-detection and forecasting heads; implemented root-cause routing for targeted remediation.
- Implemented a novel skip-connection-based Mamba model to mitigate multi-head-attention-related retrieval decline in SSM architectures for linear-time LLM settings.
- Analyzed State Space Models and Mamba-2 gather-and-aggregate bottlenecks on MMLU retrieval benchmarks, including embedding perturbation/projection behavior.
- Engineered and deployed a React web application for simulating and visualizing textual Markov Chain models for 900+ professionals; used in a Concord University study (demo).
- Built efficient graph generation and probabilistic next-word computation pipelines for bigram, trigram, and tetragram models with dynamic rendering/caching for low latency on large corpora.
- Devised a loss-landscape distillation compression method (SPRKD) yielding 2-24% higher accuracy at roughly 10% of conventional training time in tested configurations.
- Computed high-descent-potential saddle points in the teacher optimization landscape via Hessian approximation and spectral analysis for rapid curvature-tracking student convergence.
- Project: SPRKD
- Engineered the company’s first ML backend infrastructure for high server volume and minimum latency using prompt chunking, custom drug vector embeddings, and safety guardrails.
- Developed an LLM-based drug monograph summarization API and web application for real-time clinical patient diagnosis workflows.
- Led a team of three to architect and deploy a recommendation engine API serving 45,000+ clients in prototyping, targeting maximal average rating-change uplift.
- Built a hybrid GAN/Gradient Boosted Tree/MLP model pipeline and production deployment stack with Python, PyTorch, TensorFlow, XGBoost, SQL, and Flask.
Honors and Awards
Selected recognitions across quantitative trading, machine learning research, and international science competitions.
Projects
Selected technical projects spanning efficient attention, optimization, and applied ML systems.
Publications & Articles
AI Education Textbook: Education for the Next Generation: Nurturing Effective Learning
Published on Amazon; peaked at #4 in Neural Networks.
Talks & Presentations
Mathematical Foundations of ML, AI Research. For all talk links, visit linktr.ee/AdiCMU
TEDx Talk
Innovire Talk
News
Skills
Contact
Pittsburgh, PA | (647) 408-6446 | adewan2@andrew.cmu.edu
Email | LinkedIn | GitHub | Twitter | Newsletter