On the Generalization of Stochastic Gradient Descent with Momentum
Related publications (51)
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size γ and momentum parameter β that allows u ...
Deep neural networks have become ubiquitous in today's technological landscape, finding their way in a vast array of applications. Deep supervised learning, which relies on large labeled datasets, has been particularly successful in areas such as image cla ...
EPFL2023
Within the context of contemporary machine learning problems, efficiency of optimization process depends on the properties of the model and the nature of the data available, which poses a significant problem as the complexity of either increases ad infinit ...
EPFL2023
, ,
We develop a principled approach to end-to-end learning in stochastic optimization. First, we show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map. Building on the insights of this an ...
The monumental progress in the development of machine learning models has led to a plethora of applications with transformative effects in engineering and science. This has also turned the attention of the research community towards the pursuit of construc ...
Poisoning attacks compromise the training data utilized to train machine learning (ML) models, diminishing their overall performance, manipulating predictions on specific test samples, and implanting backdoors. This article thoughtfully explores these atta ...
Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrain ...
In this thesis, we study two closely related directions: robustness and generalization in modern deep learning. Deep learning models based on empirical risk minimization are known to be often non-robust to small, worst-case perturbations known as adversari ...
We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with 1≤α≤2 which holds in a wide range of applications in machine learning and signal processing. This conditio ...
This work aims to study the effects of wind uncertainties in civil engineering structural design. Optimising the design of a structure for safety or operability without factoring in these uncertainties can result in a design that is not robust to these per ...