|
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Elliot L. Epstein*, Dan Y. Fu*, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré
ICML, 2023
arxiv /
code /
blog post /
What is the simplest architecture you can use to get good performance on sequence modeling with subquadratic compute scaling in the sequence length? State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime performance. This research studies whether directly learning long convolutions over the sequence can match SSMs in performance and efficiency.
|
|
Ultrasound image analysis using deep neural networks for discriminating between benign and malignant ovarian tumors: comparison with expert subjective assessment
F Christiansen, Elliot L. Epstein, E Smedberg, Mans Akerlund, Kevin Smith, E Epstein
Ultrasound In Obstetrics & Gynaecology, 2021
arxiv /
This research develops a method to discriminate benign from malignant ovarian tumors based on transfer learning from a pretrained model on ImageNet. The model achieves an accuracy comparable to a human expert.
|
Internships
Work done during internships.
|
|
PhD Software Engineering Intern
Google
Jun. 2024 — Sep. 2024
Outcome: Research paper “MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark”
Student Researcher
Oct. 2023 — Jan. 2024
Worked on an LLM based chatbot.
Software Engineering Intern
Jun. 2023 — Sep. 2023
|
|
Intern, Quant and Data Group
EDF Trading
Apr. 2021 — Aug. 2021
- Developed a model in Python to predict the direction of the next trade of day ahead gas futures with over 70 percent
accuracy using LOB data and an ensemble of LSTM networks trained on multiple GPUs in the cloud.
- Built a web application to display real time predictions from neural network and random forest models to predict the
15-minute ahead closing price of month ahead gas futures.
- Created an environment for trading using limit order book (LOB) data, and utilized a proximal policy optimization
reinforcement learning agent to create a trading strategy for month ahead gas futures.
|
Projects
These include coursework, side projects and unpublished research work.
|
|
Robust Domain Adaptation by Adversarial Training and Classification
Stanford CS224N: Natural Language Processing
Mar. 2022
This project extended a method to train a question answering model on out of distribution data by new data augmentation techniques and added a classifier module to determine if a question was answerable or not.
Work done in collaboration with Nicolas Ågnes.
|
|
Value Iteration for Markov Decision Processes
Stanford CME 307: Optimization
Mar. 2022
This project formulated an MDP problem as a linear program, a contraction result was proved.
Extensions of value and policy iteration were implemented and the convergence rate was analyzed, and the findings were empirically verified on a simple tic-tac-toe implementation.
|
|
Multi-Fidelity Hamiltonian Monte Carlo
Stanford
Aug. 2020
poster /
Hamiltonian Monte Carlo improves upon standard MCMC when gradients of the probability distribution are available. However, for settings where the gradients are not available, such as for inverse modeling from physical simulations, these methods are not available.
This research shows the efficiency of a new algorithm, Multi-Fidelity Hamiltonian Monte Carlo, based on a Neural Network surrogate model for the gradient.
Work done as a research assistant with Eric Darve.
|
|
A review of the Article Gradient Descent Provably Optimizes over-parameterized neural networks
ETH Zurich
Aug. 2020
paper /
This work theoretically studied convergence for a shallow neural network when trained with gradient descent.
By using a gradient flow argument, the dynamics of the predictions were directly analyzed rather than the weights.
Convergence was proved under the condition that the neural network is polynomially over-parameterized and the least eigenvalue of a data dependent matrix is positive.
Work supervised by Arnulf Jentzen as part of my bachelor thesis at ETH Zurich.
|
|
Image Semantic Segmentation based on Deep Learning
Zhejiang University
Aug. 2019
Using Tensorflow, the MASK-RCNN model is trained and evaluated on a new hand curated dataset.
Work done with Filip Christiansen as part of a research visit at Zhejiang University in Hangzhou, China.
|
|
Stanford University
Ph.D. in Computational and Mathematical Engineering
Stanford, United States
2021 — Present
GPA : 4.16/4.3
|
|
University of Oxford
MS in Mathematical and Computational Finance
Oxford, United Kingdom
2020 — 2021
|
|
ETH Zurich
Exchange Student, Department of Mathematics
Zurich, Switzerland
2019 — 2020
|
|
KTH Royal Institute of Technology
BS in Engineering Physics
Stockholm, Sweden
2017 — 2020
GPA : 4.94/5.00
|
|
Graduate Teaching Assistantships
Stanford, United States
- Advanced Investment Science: MS&E 245B (Spring 2024)
- Financial Risk Analytics: MS&E 246 (Winter 2024)
- Applied Data Science: CME 218 (Fall 2023)
Mentoring Stanford graduate students working on machine learning projects.
- Partial Differential Equations: CME 303 (Fall 2022)
A graduate class on partial differential equations.
- Machine Learning: CS 229 (Summer 2022)
Topics include: Supervised learning (deep learning), unsupervised learning, and reinforcement learning.
|
Blog
Short articles on various topics.
|
|