Publication

STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization

Volkan Cevher, Ali Kavis
2021
Conference paper

Abstract

In this work we investigate stochastic non-convex optimization problems wherethe objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is variance reduction techniques, which are also known to obtain tight convergence rates, matching the lower bounds in this case. Nevertheless, these techniques require a careful maintenance of anchor points in conjunction with appropriately selected “mega-batchsizes". This leads to a challenging hyperparameter tuning problem, that weakens their practicality. Recently, [Cutkosky and Orabona, 2019] have shown that one can employ recursive momentum in order to avoid the use of anchor points and large batchsizes, and still obtain the optimal rate for this setting. Yet, their method called STORM crucially relies on the knowledge of the smoothness, as well a bound on the gradient norms. In this work we propose STORM+, a new method that is completely parameter-free, does not require large batch-sizes, and obtains the optimal $O(1/T^{1/3})$ rate for finding an approximate stationary point. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.

Official source

https://infoscience.epfl.ch/record/289807?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Volkan Cevher, Ali Kavis
2021
Conference paper

Abstract

Official source

https://infoscience.epfl.ch/record/289807?ln=en

About this result

Ontological neighbourhood

Information engineering

Machine learning: Artificial neural networks

Mathematics

Analysis: Numerical analysis

Related concepts (32)

Related publications (69)

Related MOOCs (17)

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size

\gamma

and momentum parameter

\beta

that allows u ...

2024

STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization

Graph Chatbot

Chat with Graph Search

Residual-based attention in physics-informed neural networks

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

On the Generalization of Stochastic Gradient Descent with Momentum

On the Generalization of Stochastic Gradient Descent with Momentum

Residual-based attention in physics-informed neural networks

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks