Publication

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Nicolas Henri Bernard Flammarion, Aditya Vardhan Varre, Loucas Pillaud-Vivien
2021
Report or working paper

Abstract

Motivated by the recent successes of neural networks that have the ability to fit the data perfectly \emph{and} generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$ , where $\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To solve this problem, we consider the estimator given by the last iterate of stochastic gradient descent (SGD) with constant step-size. In this context, our contribution is two fold: (i) \emph{from a (stochastic) optimization perspective}, we exhibit an archetypal problem where we can show explicitly the convergence of SGD final iterate for a non-strongly convex problem with constant step-size whereas usual results use some form of average and (ii) \emph{from a statistical perspective}, we give explicit non-asymptotic convergence rates in the over-parameterized setting and leverage a \emph{fine-grained} parameterization of the problem to exhibit polynomial rates that can be faster than $O(1/T)$ . The link with reproducing kernel Hilbert spaces is established.

Official source

https://infoscience.epfl.ch/record/283259?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Nicolas Henri Bernard Flammarion, Aditya Vardhan Varre, Loucas Pillaud-Vivien
2021
Report or working paper

Abstract

Official source

https://infoscience.epfl.ch/record/283259?ln=en

About this result

Ontological neighbourhood

Statistics

Data analysis: Regression analysis

Information engineering

Machine learning: Artificial neural networks

Related concepts (33)

Related publications (106)

Related MOOCs (26)

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Nicolas Henri Bernard Flammarion, Hristo Georgiev Papazov, Scott William Pesme

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size

\gamma

and momentum parameter

\beta

that allows u ...

2024

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Graph Chatbot

Chat with Graph Search

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity

The statistical complexity of early-stopped mirror descent

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity

The statistical complexity of early-stopped mirror descent