Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization

We propose an adaptive variance-reduction method, called AdaSpider, for minimization of L-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider combines an AdaGrad-inspired [Duchi et al., 2011, McMahan & Streeter, 2010], but a fairly distinct, adaptive step-size schedule with the recursive stochastic path integrated estimator proposed in [Fang et al., 2018]. To our knowledge, Adaspider is the first parameter-free non-convex variance-reduction method in the sense that it does not require the knowledge of problem-dependent parameters, such as smoothness constant L, target accuracy ϵ or any bound on gradient norms. In doing so, we are able to compute an ϵ-stationary point with Õ (n+n‾√/ϵ2) oracle-calls, which matches the respective lower bound up to logarithmic factors.

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size

\gamma

and momentum parameter

\beta

that allows u ...

2024

Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization

Graph Chatbot

Chattez avec Graph Search

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

On the Generalization of Stochastic Gradient Descent with Momentum

Scalable constrained optimization

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

On the Generalization of Stochastic Gradient Descent with Momentum

Scalable constrained optimization