Elliot Epstein

I am a final-year PhD student at Stanford in the Institute for Computational and Mathematical Engineering.

My work focuses on efficient and interpretable machine learning methods for time series and sequence modeling tasks where classical statistical techniques and standard architectures such as Transformers often fail or scale poorly. These include problems with large cross-sectional dimension and tasks involving very long sequences.

In the Summer of 2025, I was a Quant Research Intern at Jump Trading. I have spent previous summers at Google, most recently on the Gemini team, working on automated evaluation of instruction following in LLMs. Before Stanford, I completed an MS in Mathematical and Computational Finance at Oxford, along with a quant internship at a commodity trading firm.

epsteine@stanford.edu  /  CV  /  GitHub  /  Google Scholar  /  LinkedIn

profile photo

Research

project image

LLMs are Overconfident: Evaluating Confidence Interval Calibration with FermiEval


Elliot L. Epstein, John Winnicki, Thanawat Sornwanee, Rajat Vadiraj Dwaraknath
AIR-FM@AAAI (Best Paper Award), 2026
arxiv / slides /

project image

Attention Factors for Statistical Arbitrage


Elliot L. Epstein, Jaewon Choi, Rose Wang, Markus Pelger
International Conference on AI in Finance (Oral Presentation), 2025
arxiv / slides /

project image

A Set-Sequence Model for Time Series


Elliot L. Epstein, Apaar Sadhwani, Kay Giesecke
FinAI@ICLR, 2025
arxiv /

project image

Score-Debiased Kernel Density Estimation


Elliot L. Epstein*, Rajat Vadiraj Dwaraknath*, Thanawat Sornwanee*, John Winnicki*, Jerry Weihong Liu*
NeurIPS, 2025
arxiv / slides /

project image

MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark


Elliot L. Epstein, Kaisheng Yao, Jing Li, Shoshana Bai, Hamid Palangi
SFLLM@NeurIPS, 2024
arxiv /

Research done during my 2024 internship on the Gemini Team at Google.

project image

Simple Hardware-Efficient Long Convolutions for Sequence Modeling


Elliot L. Epstein*, Dan Y. Fu*, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré
ICML, 2023
arxiv / code / blog post /

What is the simplest architecture you can use to get good performance on sequence modeling with subquadratic compute scaling in the sequence length? State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime performance. This research studies whether directly learning long convolutions over the sequence can match SSMs in performance and efficiency.

project image

Ultrasound image analysis using deep neural networks for discriminating between benign and malignant ovarian tumors: comparison with expert subjective assessment


F Christiansen, Elliot L. Epstein, E Smedberg, Mans Akerlund, Kevin Smith, E Epstein
Ultrasound In Obstetrics & Gynaecology, 2021
arxiv /

This research develops a method to discriminate benign from malignant ovarian tumors based on transfer learning from a pretrained model on ImageNet. The model achieves an accuracy comparable to a human expert.




Internships

Quantitative Research Intern


Jump Trading
Jun. 2025 — Aug. 2025



PhD Software Engineering Intern


Google
Jun. 2024 — Sep. 2024

Intern in the Gemini Team. Outcome: Research paper “MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark”

Student Researcher



Oct. 2023 — Jan. 2024

Worked on an LLM based dialogue system.

Software Engineering Intern



Jun. 2023 — Sep. 2023

Worked on an LLM based dialogue system.

Intern, Quant and Data Group


EDF Trading
Apr. 2021 — Aug. 2021

  • Developed a model in Python to predict the direction of the next trade of day ahead gas futures with over 70 percent accuracy using LOB data and an ensemble of LSTM networks trained on multiple GPUs in the cloud.
  • Built a web application to display real time predictions from neural network and random forest models to predict the 15-minute ahead closing price of month ahead gas futures.
  • Created an environment for trading using limit order book (LOB) data, and utilized a proximal policy optimization reinforcement learning agent to create a trading strategy for month ahead gas futures.



Education

project image

Stanford University


Ph.D. in Computational and Mathematical Engineering
Stanford, United States
2021 — Present
GPA : 4.10/4.3

project image

University of Oxford


MS in Mathematical and Computational Finance
Oxford, United Kingdom
2020 — 2021

project image

ETH Zurich


Exchange Student, Department of Mathematics
Zurich, Switzerland
2019 — 2020

project image

KTH Royal Institute of Technology


BS in Engineering Physics
Stockholm, Sweden
2017 — 2020
GPA : 4.94/5.00




Teaching

project image

Graduate Teaching Assistantships


Stanford, United States

  • Investment Science: MS&E 245A (Fall 2024)
  • Advanced Investment Science: MS&E 245B (Spring 2024)
  • Financial Risk Analytics: MS&E 246 (Winter 2024)
  • Applied Data Science: CME 218 (Fall 2023)
    Mentoring Stanford graduate students working on machine learning projects.
  • Partial Differential Equations: CME 303 (Fall 2022)
    A graduate class on partial differential equations.
  • Machine Learning: CS 229 (Summer 2022)
    Topics include: Supervised learning (deep learning), unsupervised learning, and reinforcement learning.



Blog

Short articles on various topics.

project image

Blog Posts


Ironman Italy 2024 - Race Report
Sep. 29, 2024
Training and race day report.

Atop Chimborazo: A Journey to 20,600 Feet - Trip Report
Jan. 15, 2024
The peak of Chimborazo is the furthest point from the center of the earth.

Working on a GPU Cluster: A Practical Setup Guide
Sep. 30, 2023
A step-by-step guide to setting up the development environment for a GPU cluster. The tutorial is done on the Stanford ICME GPU cluster.

Creating Your Own Personal Website: A step-by-step guide using Jekyll and Github Pages
May 29, 2023
The steps I took to create this website and how to create your own website in a similar style.

Simple Long Convolutions for Sequence Modeling
Feb. 15, 2023
What is the most simple architecture with a sub-quadratic scaling in the sequence length that does well on a range of sequence modeling tasks?


Design and source code from Leonid Keselman's website