Publications associées à Algorithme du gradient stochastique

On the Generalization of Stochastic Gradient Descent with Momentum

While momentum-based accelerated variants of stochastic gradient descent (SGD) are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In this work, we first show that th ...

Microtome Publishing2024

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size

\gamma

and momentum parameter

\beta

that allows u ...

2024

Understanding generalization and robustness in modern deep learning

Maksym Andriushchenko

In this thesis, we study two closely related directions: robustness and generalization in modern deep learning. Deep learning models based on empirical risk minimization are known to be often non-robust to small, worst-case perturbations known as adversari ...

EPFL2024

Scalable constrained optimization

Maria-Luiza Vladarean

Modern optimization is tasked with handling applications of increasingly large scale, chiefly due to the massive amounts of widely available data and the ever-growing reach of Machine Learning. Consequently, this area of research is under steady pressure t ...

EPFL2024

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

Anastasiia Koloskova

Distributed learning is the key for enabling training of modern large-scale machine learning models, through parallelising the learning process. Collaborative learning is essential for learning from privacy-sensitive data that is distributed across various ...

EPFL2024

Revisiting Character-level Adversarial Attacks for Language Models

Volkan Cevher, Grigorios Chrysos, Fanghui Liu, Yongtao Wu, Elias Abad Rocamora

Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adv ...

2024

Topics in statistical physics of high-dimensional machine learning

Hugo Chao Cui

In the past few years, Machine Learning (ML) techniques have ushered in a paradigm shift, allowing the harnessing of ever more abundant sources of data to automate complex tasks. The technical workhorse behind these important breakthroughs arguably lies in ...

EPFL2024

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Arnout Jan J Devos

Machine learning (ML) enables artificial intelligent (AI) agents to learn autonomously from data obtained from their environment to perform tasks. Modern ML systems have proven to be extremely effective, reaching or even exceeding human intelligence. Altho ...

EPFL2024

Robust machine learning for neuroscientific inference

Steffen Schneider

Modern neuroscience research is generating increasingly large datasets, from recording thousands of neurons over long timescales to behavioral recordings of animals spanning weeks, months, or even years. Despite a great variety in recording setups and expe ...

EPFL2024

Statistical Inference for Inverse Problems: From Sparsity-Based Methods to Neural Networks

Pakshal Narendra Bohra

In inverse problems, the task is to reconstruct an unknown signal from its possibly noise-corrupted measurements. Penalized-likelihood-based estimation and Bayesian estimation are two powerful statistical paradigms for the resolution of such problems. They ...

EPFL2024

Toward plasma drifts in EMC3: Implementation of gradient, divergence, and particle tracing schemes

Matthieu Benoit C. Jacobs

This paper presents a first implementation of gradient, divergence, and particle tracing schemes for the EMC3 code, a stochastic 3D plasma fluid code widely employed for edge plasma and impurity transport modeling in tokamaks and stellarators. These scheme ...

Wiley-V C H Verlag Gmbh2024

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Scott William Pesme

In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of

2

-layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal ...

EPFL2024

Byzantine Fault-Tolerance in Federated Local SGD Under 2f-Redundancy

Nirupam Gupta

In this article, we study the problem of Byzantine fault-tolerance in a federated optimization setting, where there is a group of agents communicating with a centralized coordinator. We allow up to

f

Byzantine-faulty agents, which may not follow a prescr ...

Ieee-Inst Electrical Electronics Engineers Inc2023

Universal and adaptive methods for robust stochastic optimization

Ali Kavis

Within the context of contemporary machine learning problems, efficiency of optimization process depends on the properties of the model and the nature of the data available, which poses a significant problem as the complexity of either increases ad infinit ...

EPFL2023

Data Downloaded via Parachute from a NASA Super-Pressure Balloon

David Richard Harvey, Mathilde Jauzac, Richard Massey, Lun Li

In April 2023, the superBIT telescope was lifted to the Earth's stratosphere by a helium-filled super-pressure balloon to acquire astronomical imaging from above (99.5% of) the Earth's atmosphere. It was launched from New Zealand and then, for 40 days, cir ...

MDPI2023

Phenomenological theory of variational quantum ground-state preparation

Ivano Tavernelli, Giuseppe Carleo

The variational approach is a cornerstone of computational physics, considering both conventional and quantum computing computational platforms. The variational quantum eigensolver algorithm aims to prepare the ground state of a Hamiltonian exploiting para ...

Amer Physical Soc2023

Towards Stable and Efficient Adversarial Training against $l_1$ Bounded Adversarial Attacks

Sabine Süsstrunk, Mathieu Salzmann, Yulun Jiang, Chen Liu, Zhuoyi Huang

We address the problem of stably and efficiently training a deep neural network robust to adversarial perturbations bounded by an

l_1

norm. We demonstrate that achieving robustness against

l_1

-bounded perturbations is more challenging than in the

l_2

...

2023

Robust Collaborative Learning with Linear Gradient Overhead

Rachid Guerraoui, John Stephan, Sadegh Farhadkhani, Le Nguyen Hoang, Nirupam Gupta, Rafaël Benjamin Pinot

Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been ...

PLMR2023

Impact of Redundancy on Resilience in Distributed Optimization and Learning

Nirupam Gupta, Shuo Liu

This paper considers the problem of resilient distributed optimization and stochastic learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has its own local cost function. The agents collaborate with ...

Assoc Computing Machinery2023

Fast Adversarial Training With Adaptive Step Size

Mathieu Salzmann, Sabine Süsstrunk, Chen Liu, Zhuoyi Huang, Yong Zhang, Jue Wang

While adversarial training and its variants have shown to be the most effective algorithms to defend against adversarial attacks, their extremely slow training process makes it hard to scale to large datasets like ImageNet. The key idea of recent works to ...

Ieee-Inst Electrical Electronics Engineers Inc2023