Aditya Dewan

Aditya Dewan

(647) 408-6446 | adewan2@andrew.cmu.edu | LinkedIn | GitHub

About

Quantitative finance and machine learning researcher focused on statistical modeling of financial markets, efficient transformer architectures, and loss-landscape-based knowledge distillation. Aspiring deep learning researcher working towards mastery in the field.

Education

B.S. Computer Science, Mathematics Minor, Machine Learning Concentration Carnegie Mellon University
May 2028
Pittsburgh, PA

Clubs: Quant Club (Goldman Sachs Quantathon: 100%), CMU Racing (built CUDA SVMs with 2x speedup for GPU midline generation).

Systems Work: Built Malloc (dynamic memory allocator) and a fully concurrent, thread-safe file system in C from scratch.

Gifted Program The Woodlands Secondary School
Sep. 2020 – Jun. 2024

Selected Coursework

Machine Learning & Systems: 10-723 Generative AI (Ph.D.), 11-785 Deep Learning (Ph.D.), 10-701 Machine Learning (Ph.D.), 15-442 Machine Learning Systems (GPU + Kernel Optimization), 15-210 Parallel Computing + Algorithms, 15-213 Introduction to Computer Systems, 15-122 Data Structures and Algorithms, 15-150 Functional Programming
Math, Theory & Quantitative: 21-301 Combinatorics, 15-259 Probability and Computing, 21-270 Mathematical Finance, 15-251 Theoretical Computer Science/Discrete Math, 15-195 Competitive Programming, 21-241 Matrices and Linear Algebra, 21-295 Putnam Seminar

Research & Experience

Research-heavy ML work and production systems engineering across long-context modeling, diagnostics, and high-reliability infrastructure.

Software Development Engineer Intern Amazon Web Services (AWS), Amazon DCV Team
May 2026 – Aug. 2026
Seattle, WA
  • Designed a Poisson-process packet-loss probe scheduler with provable guarantees, modeling inter-arrivals and anomaly recurrences to achieve at least 95% detection within seven recurrences; bounded heavy-tail wait behavior via Chernoff-bound analysis and Lambert W-based calibration.
  • Built a real-time C#/.NET diagnostics client (STA/MTA multithreading, async probes, rotating ndjson sessions) streaming 51 live metrics with embedded Gaussian fitting, pairwise correlation heatmaps, scatter/histograms, and session replay; reduced production troubleshooting time by roughly 50% for clients including Netflix, the U.S. Army, and Volkswagen.
  • Architected an end-to-end time-series ML pipeline that injects CPU/GPU/RTT faults, dynamically labels causal signals, and fine-tunes a Tiny Time Mixer with custom anomaly-detection and forecasting heads; implemented root-cause routing for targeted remediation.
Machine Learning Summer Research Intern Goomba Lab (Mamba Architecture), CMU
May 2025 – Aug. 2025
Pittsburgh, PA
  • Implemented a novel skip-connection-based Mamba model to mitigate multi-head-attention-related retrieval decline in SSM architectures for linear-time LLM settings.
  • Analyzed State Space Models and Mamba-2 gather-and-aggregate bottlenecks on MMLU retrieval benchmarks, including embedding perturbation/projection behavior.
Machine Learning Research Intern CMU, Dr. David Touretzky
Jul. 2023 – Aug. 2024
Pittsburgh, PA
  • Engineered and deployed a React web application for simulating and visualizing textual Markov Chain models for 900+ professionals; used in a Concord University study (demo).
  • Built efficient graph generation and probabilistic next-word computation pipelines for bigram, trigram, and tetragram models with dynamic rendering/caching for low latency on large corpora.
Award-winning Neural Network Compression Algorithm Regeneron ISEF
Aug. 2022 – Aug. 2024
Dallas, TX
  • Devised a loss-landscape distillation compression method (SPRKD) yielding 2-24% higher accuracy at roughly 10% of conventional training time in tested configurations.
  • Computed high-descent-potential saddle points in the teacher optimization landscape via Hessian approximation and spectral analysis for rapid curvature-tracking student convergence.
  • Project: SPRKD
Junior Machine Learning Engineer The Rounds ($40M healthcare startup)
Jul. 2023 – Jan. 2024
Toronto, CA
  • Engineered the company’s first ML backend infrastructure for high server volume and minimum latency using prompt chunking, custom drug vector embeddings, and safety guardrails.
  • Developed an LLM-based drug monograph summarization API and web application for real-time clinical patient diagnosis workflows.
Machine Learning Specialist Actionable.co (1000+ orgs)
Jul. 2021 – May. 2023
Toronto, CA
  • Led a team of three to architect and deploy a recommendation engine API serving 45,000+ clients in prototyping, targeting maximal average rating-change uplift.
  • Built a hybrid GAN/Gradient Boosted Tree/MLP model pipeline and production deployment stack with Python, PyTorch, TensorFlow, XGBoost, SQL, and Flask.

Honors and Awards

Selected recognitions across quantitative trading, machine learning research, and international science competitions.

Optiver Market Making Competition
1st Place in Quantitative Trading; first freshman team to ever win.
Hudson River Trading, Best Use of Quantitative Data Award
Awarded for a machine-learning insurance-pricing platform.
U.S. National Security Agency (NSA)
Second Award in Cybersecurity/Mathematics at Regeneron ISEF (1600+ finalists) for neural network compression research.
Regeneron ISEF
One of 8 selected to represent Canada; Fourth Award in Robotics and Intelligent Machines.
Google DeepMind Offer
Selected as a research contractor as a sophomore (declined due to start date conflict).
TEDx Speaker, Innovire Speaker
Mathematical Foundations of ML and AI research talks (linktr.ee/AdiCMU).
Morgan Stanley / Quantbot Trading Competition
Highest Sharpe ratio across all submissions (6th overall).

Projects

Selected technical projects spanning efficient attention, optimization, and applied ML systems.

HFOLD: Dynamic Hidden State Heap Folding for Efficient Long-Context Attention
Introduced an inference-time memory mechanism that augments sliding-window attention with a bounded heap of high-relevance historical states. At each timestep, HFOLD retrieves top-ranked hidden states, reinserts transformed high-attention tokens, and applies a relevance-conditioned FOLD update (via GEM + RSM modules) to preserve information from evicted states. On Pythia 14M/31M evaluation, HFOLD recovered substantial long-range quality lost by plain sliding-window attention in language modeling (perplexity reduction from 225.1 to 91.7 in the full-attention-finetuned setting), while maintaining linear-time asymptotic behavior. The main current limitation is systems-level throughput overhead from non-fused heap operations.
SPRKD
SPRKD (Saddle Point Recruitment for Knowledge Distillation)
2022-2024
Research-driven distillation/compression framework leveraging loss-landscape structure and saddle-point dynamics; improved accuracy by 2-24% with major training-efficiency gains, recognized by NSA, ISEF, and WAICY.
Born-Again Neural Networks
Born-Again Neural Networks
From-scratch reproduction of Born-Again Neural Networks with controlled teacher-student experiments to analyze iterative self-distillation behavior.
Adam Optimization
Adam Optimization From Scratch
Clean-room implementation of Adam with detailed instrumentation for convergence diagnostics and optimizer behavior analysis.
Maxout Activation
Custom Maxout Activation Implementation
Numerically stable Maxout implementation and experiments comparing representational behavior against conventional activation families.
Expected Gradient Divergence Weighting (EGDW) for TITANS Memory Updates
2025
Proposed a probabilistic memory-update rule for TITANS using Markov/Jensen bounds to stabilize neural memory writes; achieved improved validation cross-entropy relative to baseline despite higher train loss, indicating stronger generalization.
Autonomous Vehicle Simulator
Autonomous Vehicle Simulator
Behavioral-cloning based autonomous driving stack in simulation, including data collection, model training, and closed-loop trajectory validation.
Symptom Diagnosis AI
Symptom Diagnosis AI
Clinical NLP assistant for symptom triage and patient-facing explanation generation, built as an end-to-end inference API with web integration.
Chess AI
Chess AI
Chess engine prototype with search/evaluation pipeline and iterative model tuning to study strategic depth under constrained compute.
Crysta
Crysta
Student productivity platform MVP centered on non-invasive energy-state estimation, experimentation design, and behavior-adaptive scheduling.

Publications & Articles

AI Education Textbook: Education for the Next Generation: Nurturing Effective Learning

Published on Amazon; peaked at #4 in Neural Networks.

Viewpoint Paper on Knowledge Distillation
Atherma: Solving the Energy Crisis with Nuclear

Talks & Presentations

Mathematical Foundations of ML, AI Research. For all talk links, visit linktr.ee/AdiCMU

TEDx Talk

Innovire Talk

News

Skills

Languages: Python, C, C++, C#, Java, SML, SQL, JavaScript
Frameworks & Technologies: PyTorch, TensorFlow, .NET, Node.js, Flask, React.js, NumPy, Pandas, XGBoost, CUDA, Git

Contact

Pittsburgh, PA | (647) 408-6446 | adewan2@andrew.cmu.edu

Email | LinkedIn | GitHub | Twitter | Newsletter