Publications by Amirkeivan Mohtashami

Enhanced Architectures and Optimization Methods for Efficient Language Modeling

This thesis investigates methods to construct more capable models more efficiently, focusing on two aspects: improved architectures and optimization. We examine principled architectural modifications that reduce computational costs or introduce features fo ...

EPFL2025

The splay-list: a distribution-adaptive concurrent skip-list

Amirkeivan Mohtashami, Dan Alistarh

The design and implementation of efficient concurrent data structures has seen significant attention. However, most of this work has focused on concurrent data structures providing good worst-case guarantees, although, in real workloads, objects are often ...

SPRINGER2023

Masked Training of Neural Networks with Partial Gradients

Martin Jaggi, Amirkeivan Mohtashami, Sebastian Urban Stich

State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extra-gradient), limiting SGD updates to a sub ...

JMLR-JOURNAL MACHINE LEARNING RESEARCH2022

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

Martin Jaggi, Amirkeivan Mohtashami, Sebastian Urban Stich

It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and—in asynchronous implementations—on the gradient staleness. Especially, it has been observed that the spe ...

MICROTOME PUBLISHING2021