Elliot Epstein

I am an incoming fifth year PhD student at Stanford in the Institute for Computational and Mathematical Engineering.

My work focuses on efficient and interpretable machine learning methods for time series and sequence modeling tasks where classical statistical techniques and standard architectures such as Transformers often fail or scale poorly. These include problems with large cross-sectional dimension and tasks involving very long sequences.

Currently, I am a Quant Research Intern at Jump Trading. I have spent previous summers at Google, most recently on the Gemini team, working on automated evaluation of instruction following in LLMs. Before Stanford, I completed an MS in Mathematical and Computational Finance at Oxford, along with a quant internship at a commodity trading firm.

epsteine@stanford.edu / CV / GitHub / Google Scholar / LinkedIn

Research

	A Set-Sequence Model for Time Series Elliot L. Epstein, Apaar Sadhwani, Kay Giesecke FinAI@ICLR, 2025 arxiv /
	Score-Debiased Kernel Density Estimation Elliot L. Epstein, Rajat Vadiraj Dwaraknath, Thanawat Sornwanee, John Winnicki, Jerry Weihong Liu FPI@ICLR*, 2025 arxiv /
	MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark Elliot L. Epstein, Kaisheng Yao, Jing Li, Shoshana Bai, Hamid Palangi SFLLM@NeurIPS, 2024 arxiv / Research done during my 2024 internship on the Gemini Team at Google.
	Simple Hardware-Efficient Long Convolutions for Sequence Modeling Elliot L. Epstein, Dan Y. Fu, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré ICML, 2023 arxiv / code / blog post / What is the simplest architecture you can use to get good performance on sequence modeling with subquadratic compute scaling in the sequence length? State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime performance. This research studies whether directly learning long convolutions over the sequence can match SSMs in performance and efficiency.
	Ultrasound image analysis using deep neural networks for discriminating between benign and malignant ovarian tumors: comparison with expert subjective assessment F Christiansen, Elliot L. Epstein, E Smedberg, Mans Akerlund, Kevin Smith, E Epstein Ultrasound In Obstetrics & Gynaecology, 2021 arxiv / This research develops a method to discriminate benign from malignant ovarian tumors based on transfer learning from a pretrained model on ImageNet. The model achieves an accuracy comparable to a human expert.

Internships

Quantitative Research Intern

Jump Trading
Jun. 2025 — Aug. 2025

PhD Software Engineering Intern

Google
Jun. 2024 — Sep. 2024

Intern in the Gemini Team. Outcome: Research paper “MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark”

Student Researcher

Oct. 2023 — Jan. 2024

Worked on an LLM based dialogue system.

Software Engineering Intern

Jun. 2023 — Sep. 2023

Worked on an LLM based dialogue system.

Intern, Quant and Data Group

EDF Trading
Apr. 2021 — Aug. 2021

Developed a model in Python to predict the direction of the next trade of day ahead gas futures with over 70 percent accuracy using LOB data and an ensemble of LSTM networks trained on multiple GPUs in the cloud.
Built a web application to display real time predictions from neural network and random forest models to predict the 15-minute ahead closing price of month ahead gas futures.
Created an environment for trading using limit order book (LOB) data, and utilized a proximal policy optimization reinforcement learning agent to create a trading strategy for month ahead gas futures.

Projects

These include coursework, side projects and unpublished research work.

	Robust Domain Adaptation by Adversarial Training and Classification Stanford CS224N: Natural Language Processing Mar. 2022 This project extended a method to train a question answering model on out of distribution data by new data augmentation techniques and added a classifier module to determine if a question was answerable or not. Work done in collaboration with Nicolas Ågnes.
	Value Iteration for Markov Decision Processes Stanford CME 307: Optimization Mar. 2022 This project formulated an MDP problem as a linear program, a contraction result was proved. Extensions of value and policy iteration were implemented and the convergence rate was analyzed, and the findings were empirically verified on a simple tic-tac-toe implementation.
	Multi-Fidelity Hamiltonian Monte Carlo Stanford Aug. 2020 poster / Hamiltonian Monte Carlo improves upon standard MCMC when gradients of the probability distribution are available. However, for settings where the gradients are not available, such as for inverse modeling from physical simulations, these methods are not available. This research shows the efficiency of a new algorithm, Multi-Fidelity Hamiltonian Monte Carlo, based on a Neural Network surrogate model for the gradient. Work done as a research assistant with Eric Darve.
	A review of the Article Gradient Descent Provably Optimizes over-parameterized neural networks ETH Zurich Aug. 2020 paper / This work theoretically studied convergence for a shallow neural network when trained with gradient descent. By using a gradient flow argument, the dynamics of the predictions were directly analyzed rather than the weights. Convergence was proved under the condition that the neural network is polynomially over-parameterized and the least eigenvalue of a data dependent matrix is positive. Work supervised by Arnulf Jentzen as part of my bachelor thesis at ETH Zurich.
	Image Semantic Segmentation based on Deep Learning Zhejiang University Aug. 2019 Using Tensorflow, the MASK-RCNN model is trained and evaluated on a new hand curated dataset. Work done with Filip Christiansen as part of a research visit at Zhejiang University in Hangzhou, China.

Education

	Stanford University Ph.D. in Computational and Mathematical Engineering Stanford, United States 2021 — Present GPA : 4.16/4.3 Reinforcement Learning: EE 277 Natural Language Processing: CS 224N Computer Organization and Systems: CS 107 Probabilistic Graphical Models: CS 228 Numerical Linear Algebra: CME 302 Partial Differential Equations of Applied Math: CME 303 Discrete Mathematics and Algorithms: CME 305 Numerical Partial Differential Equations: CME 306 Optimization: CME 307 Stochastic Methods in Engineering: CME 308 Theory of Statistics A: Stats 300A Theory of Statistics B: Stats 300B Adventures in Design Thinking: A d.school Experience Public Policy Negotiation: Multiparty Problem-solving and Conflict Resolution Stanford Ignite Launchpad
	University of Oxford MS in Mathematical and Computational Finance Oxford, United Kingdom 2020 — 2021
	ETH Zurich Exchange Student, Department of Mathematics Zurich, Switzerland 2019 — 2020
	KTH Royal Institute of Technology BS in Engineering Physics Stockholm, Sweden 2017 — 2020 GPA : 4.94/5.00

Teaching

Graduate Teaching Assistantships

Stanford, United States

Investment Science: MS&E 245A (Fall 2024)
Advanced Investment Science: MS&E 245B (Spring 2024)
Financial Risk Analytics: MS&E 246 (Winter 2024)
Applied Data Science: CME 218 (Fall 2023)
Mentoring Stanford graduate students working on machine learning projects.
Partial Differential Equations: CME 303 (Fall 2022)
A graduate class on partial differential equations.
Machine Learning: CS 229 (Summer 2022)
Topics include: Supervised learning (deep learning), unsupervised learning, and reinforcement learning.

Blog

Short articles on various topics.

Blog Posts

Ironman Italy 2024 - Race Report
Sep. 29, 2024
Training and race day report.

Atop Chimborazo: A Journey to 20,600 Feet - Trip Report
Jan. 15, 2024
The peak of Chimborazo is the furthest point from the center of the earth.

Working on a GPU Cluster: A Practical Setup Guide
Sep. 30, 2023
A step-by-step guide to setting up the development environment for a GPU cluster. The tutorial is done on the Stanford ICME GPU cluster.

Creating Your Own Personal Website: A step-by-step guide using Jekyll and Github Pages
May 29, 2023
The steps I took to create this website and how to create your own website in a similar style.

Simple Long Convolutions for Sequence Modeling
Feb. 15, 2023
What is the most simple architecture with a sub-quadratic scaling in the sequence length that does well on a range of sequence modeling tasks?

Design and source code from Leonid Keselman's website

Elliot Epstein

Research

A Set-Sequence Model for Time Series

Score-Debiased Kernel Density Estimation

MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark

Simple Hardware-Efficient Long Convolutions for Sequence Modeling

Ultrasound image analysis using deep neural networks for discriminating between benign and malignant ovarian tumors: comparison with expert subjective assessment

Internships

Quantitative Research Intern

PhD Software Engineering Intern

Student Researcher

Software Engineering Intern

Intern, Quant and Data Group

Projects

Robust Domain Adaptation by Adversarial Training and Classification

Value Iteration for Markov Decision Processes

Multi-Fidelity Hamiltonian Monte Carlo

A review of the Article Gradient Descent Provably Optimizes over-parameterized neural networks

Image Semantic Segmentation based on Deep Learning

Education

Stanford University

University of Oxford

ETH Zurich

KTH Royal Institute of Technology

Teaching

Graduate Teaching Assistantships

Blog

Blog Posts