Elliot Epstein

I am a third year PhD student at the Institute of Computational and Mathematical Engineering at Stanford University.

Research interests: Long sequence modeling, machine learning, and statistics, with applications in medicine and finance.

At University of Oxford, I obtained an MS in Mathematical and Computational Finance. I have a BS in Engineering Physics from KTH Royal Institute of Technology, during which I spent one year at the Mathematics Department of ETH Zurich.

I am working part-time as a Student Researcher at Google.

Email  /  CV  /  GitHub  /  Google Scholar  /  LinkedIn

profile photo


project image

Simple Hardware-Efficient Long Convolutions for Sequence Modeling

Elliot L. Epstein*, Dan Y. Fu*, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré
ICML, 2023
arxiv / code / blog post /

What is the simplest architecture you can use to get good performance on sequence modeling with subquadratic compute scaling in the sequence length? State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime performance. This research studies whether directly learning long convolutions over the sequence can match SSMs in performance and efficiency.

project image

Ultrasound image analysis using deep neural networks for discriminating between benign and malignant ovarian tumors: comparison with expert subjective assessment

F Christiansen, Elliot L. Epstein, E Smedberg, Mans Akerlund, Kevin Smith, E Epstein
Ultrasound In Obstetrics & Gynaecology, 2021
arxiv /

This research develops a method to discriminate benign from malignant ovarian tumors based on transfer learning from a pretrained model on ImageNet. The model achieves an accuracy comparable to a human expert.


Work done during internships.

project image

Student Researcher

Sep. 2023 — Present

project image

Software Engineering Intern

Jun. 2023 — Sep. 2023

project image

Intern, Quant and Data Group

EDF Trading
Apr. 2021 — Aug. 2021

  • Developed a model in Python to predict the direction of the next trade of day ahead gas futures with over 70 percent accuracy using LOB data and an ensemble of LSTM networks trained on multiple GPUs in the cloud
  • Built a web application to display real time predictions from neural network and random forest models to predict the 15-minute ahead closing price of month ahead gas futures
  • Created an environment for trading using limit order book (LOB) data, and utilized a proximal policy optimization reinforcement learning agent to create a trading strategy for month ahead gas futures

Other Projects

These include coursework, side projects and unpublished research work.

project image

Robust Domain Adaptation by Adversarial Training and Classification

Stanford CS224N: Natural Language Processing
Mar. 2022

This project extended a method to train a question answering model on out of distribution data by new data augmentation techniques and added a classifier module to determine if a question was answerable or not. Work done in collaboration with Nicolas Ågnes.

project image

Value Iteration for Markov Decision Processes

Stanford CME 307: Optimization
Mar. 2022

This project formulated an MDP problem as a linear program, a contraction result was proved. Extensions of value and policy iteration were implemented and the convergence rate was analyzed, and the findings were empirically verifyed on a simple tic-tac-toe implementation.

project image

Multi-Fidelity Hamiltonian Monte Carlo

Aug. 2020
poster /

Hamiltonian Monte Carlo improves upon standard MCMC when gradients of the probability distribution are available. However, for settings where the gradients are not available, such as for inverse modeling from physical simulations, these methods are not available. This research shows the efficiency of a new algorithm, Multi-Fidelity Hamiltonian Monte Carlo, based on a Neural Network surrogate model for the gradient. Work done as a research assistant with Eric Darve.

project image

A review of the Article Gradient Descent Provably Optimizes over-parameterized neural networks

ETH Zurich
Aug. 2020
paper /

This work theoretically studied convergence for a shallow neural network when trained with gradient descent. By using a gradient flow argument, the dynamics of the predictions were directly analyzed rather than the weights. Convergence was proved under the condition that the neural network is polynomially over-parameterized and the least eigenvalue of a data dependent matrix is positive. Work supervised by Arnulf Jentzen as part of my bachelor thesis at ETH Zurich.

project image

Image Semantic Segmentation based on Deep Learning

Zhejiang University
Aug. 2019

Using Tensorflow, the MASK-RCNN model is trained and evaluated on a new hand curated dataset. Work done with Filip Christiansen as part of a research visit at Zhejiang University in Hangzhou, China.


project image

Stanford University

Ph.D. in Computational and Mathematical Engineering
Stanford, United States
2021 — Present
GPA : 4.18/4.3

project image

University of Oxford

MS in Mathematical and Computational Finance
Oxford, United Kingdom
2020 — 2021

  • Stochastic Calculus
  • Financial Statistics
  • Advanced Monte Carlo Methods
  • Financial Derivatives
  • Quantitative Risk Management
  • Algorithmic Trading
  • Stochastic Control
  • Asset Pricing
  • Numerical Methods
  • Financial Computing in C++
  • Deep Learning
project image

ETH Zurich

Exchange Student, Department of Mathematics
Zurich, Switzerland
2019 — 2020

  • Probability Theory
  • Programming Techniques for Scientific Simulations
  • Electrodynamics
  • Mathematical Foundations for Finance
  • Geophysical Fluid Dynamics
project image

KTH Royal Institute of Technology

BS in Engineering Physics
Stockholm, Sweden
2017 — 2020
GPA : 4.94/5.00

  • Real Analysis
  • Complex Analysis
  • Quantum Mechanics
  • Algorithms and Data Structures
  • Probability and Statistics


project image

Applied Data Science (CME 218)

Stanford, United States
Fall 2023

Mentoring Stanford graduate students working on machine learning projects.

project image

Partial Differential Equations of Applied Math (MATH 220)

Stanford, United States
Fall 2022
website /

A graduate class on partial differential equations.

project image

Machine Learning (CS 229)

Stanford, United States
Summer 2022
website /

Topics include: Supervised learning (deep learning), unsupervised learning, and reinforcement learning.


Short articles on various topics.

project image

Blog Posts

Working on a GPU Cluster: A Practical Setup Guide
Sep. 30, 2023
A step-by-step guide to setting up the development environment for a GPU cluster. The tutorial is done on the Stanford ICME GPU cluster.

Creating Your Own Personal Website: A step-by-step guide using Jekyll and Github Pages
May 29, 2023
The steps I took to create this website and how to create your own website in a similar style.

Simple Long Convolutions for Sequence Modeling
Feb. 15, 2023
What is the most simple archtechture with a sub-quadratic scaling in the sequence length that does well on a range of sequence modeling tasks?

Design and source code from Leonid Keselman's website