Publications related to On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size

\gamma

and momentum parameter

\beta

that allows u ...

2024

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

Anastasiia Koloskova

Distributed learning is the key for enabling training of modern large-scale machine learning models, through parallelising the learning process. Collaborative learning is essential for learning from privacy-sensitive data that is distributed across various ...

EPFL2024

Statistical Inference for Inverse Problems: From Sparsity-Based Methods to Neural Networks

Pakshal Narendra Bohra

In inverse problems, the task is to reconstruct an unknown signal from its possibly noise-corrupted measurements. Penalized-likelihood-based estimation and Bayesian estimation are two powerful statistical paradigms for the resolution of such problems. They ...

EPFL2024

Random matrix methods for high-dimensional machine learning models

Antoine Philippe Michel Bodin

In the rapidly evolving landscape of machine learning research, neural networks stand out with their ever-expanding number of parameters and reliance on increasingly large datasets. The financial cost and computational resources required for the training p ...

EPFL2024

Understanding generalization and robustness in modern deep learning

Maksym Andriushchenko

In this thesis, we study two closely related directions: robustness and generalization in modern deep learning. Deep learning models based on empirical risk minimization are known to be often non-robust to small, worst-case perturbations known as adversari ...

EPFL2024

Enabling Uncertainty Estimation in Iterative Neural Networks

Pascal Fua, Nikita Durasov, Doruk Oner, Minh Hieu Lê

Turning pass-through network architectures into iterative ones, which use their own output as input, is a well-known approach for boosting performance. In this paper, we argue that such architectures offer an additional benefit: The convergence rate of the ...

2024

Residual-based attention in physics-informed neural networks

Nikolaos Stergiopoulos, Sokratis Anagnostopoulos

Driven by the need for more efficient and seamless integration of physical models and data, physics -informed neural networks (PINNs) have seen a surge of interest in recent years. However, ensuring the reliability of their convergence and accuracy remains ...

Lausanne2024

On the Generalization of Stochastic Gradient Descent with Momentum

Volkan Cevher, Kimon Antonakopoulos

While momentum-based accelerated variants of stochastic gradient descent (SGD) are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In this work, we first show that th ...

Brookline2024

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Scott William Pesme

In this PhD manuscript, we explore optimisation phenomena which occur in complex neural networks through the lens of

2

-layer diagonal linear networks. This rudimentary architecture, which consists of a two layer feedforward linear network with a diagonal ...

EPFL2024

Topics in statistical physics of high-dimensional machine learning

Hugo Chao Cui

In the past few years, Machine Learning (ML) techniques have ushered in a paradigm shift, allowing the harnessing of ever more abundant sources of data to automate complex tasks. The technical workhorse behind these important breakthroughs arguably lies in ...

EPFL2024

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Graph Chatbot

Chat with Graph Search

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

Statistical Inference for Inverse Problems: From Sparsity-Based Methods to Neural Networks

Random matrix methods for high-dimensional machine learning models

Understanding generalization and robustness in modern deep learning

Enabling Uncertainty Estimation in Iterative Neural Networks

Residual-based attention in physics-informed neural networks

On the Generalization of Stochastic Gradient Descent with Momentum

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Topics in statistical physics of high-dimensional machine learning

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

Statistical Inference for Inverse Problems: From Sparsity-Based Methods to Neural Networks

Random matrix methods for high-dimensional machine learning models

Understanding generalization and robustness in modern deep learning

Enabling Uncertainty Estimation in Iterative Neural Networks

Residual-based attention in physics-informed neural networks

On the Generalization of Stochastic Gradient Descent with Momentum

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Topics in statistical physics of high-dimensional machine learning