On the influence of momentum acceleration on online learning

The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general strongly convex and smooth risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known benefits of momentum constructions for deterministic optimization problems do not necessarily carry over to the adaptive online setting when small constant step-sizes are used to enable continuous adaptation and learning in the presence of persistent gradient noise. From simulations, the equivalence between momentum and standard stochastic gradient methods is also observed for non-differentiable and non-convex problems.

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size

\gamma

and momentum parameter

\beta

that allows u ...

2024

On the influence of momentum acceleration on online learning

Graph Chatbot

Chat with Graph Search

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

On the Generalization of Stochastic Gradient Descent with Momentum

Wave-momentum shaping for moving objects in heterogeneous and dynamic media

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

On the Generalization of Stochastic Gradient Descent with Momentum

Wave-momentum shaping for moving objects in heterogeneous and dynamic media