Publication

STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization

Volkan Cevher, Ali Kavis
2021
Article de conférence

Résumé

In this work we investigate stochastic non-convex optimization problems wherethe objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is variance reduction techniques, which are also known to obtain tight convergence rates, matching the lower bounds in this case. Nevertheless, these techniques require a careful maintenance of anchor points in conjunction with appropriately selected “mega-batchsizes". This leads to a challenging hyperparameter tuning problem, that weakens their practicality. Recently, [Cutkosky and Orabona, 2019] have shown that one can employ recursive momentum in order to avoid the use of anchor points and large batchsizes, and still obtain the optimal rate for this setting. Yet, their method called STORM crucially relies on the knowledge of the smoothness, as well a bound on the gradient norms. In this work we propose STORM+, a new method that is completely parameter-free, does not require large batch-sizes, and obtains the optimal $O(1/T^{1/3})$ rate for finding an approximate stationary point. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.

Source officielle

https://infoscience.epfl.ch/record/289807?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Volkan Cevher, Ali Kavis
2021
Article de conférence

Résumé

Source officielle

https://infoscience.epfl.ch/record/289807?ln=fr

À propos de ce résultat

Proximité ontologique

Information engineering

Apprentissage automatique: Réseau de neurones artificiels

Mathématiques

Analyse (mathématiques): Analyse numérique

Concepts associés (32)

Publications associées (69)

MOOCs associés (17)

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size

\gamma

and momentum parameter

\beta

that allows u ...

2024

STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization

Graph Chatbot

Chattez avec Graph Search

Residual-based attention in physics-informed neural networks

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

On the Generalization of Stochastic Gradient Descent with Momentum

Residual-based attention in physics-informed neural networks

On the Generalization of Stochastic Gradient Descent with Momentum

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks