Last iterate convergence of SGD for Least-Squares in the Interpolation regime
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit [12, 9], thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: ...
In this article, we address the numerical solution of the Dirichlet problem for the three-dimensional elliptic Monge-Ampere equation using a least-squares/relaxation approach. The relaxation algorithm allows the decoupling of the differential operators fro ...
We investigate regularized algorithms combining with projection for least-squares regression problem over a Hilbert space, covering nonparametric regression over a reproducing kernel Hilbert space. We prove convergence results with respect to variants of n ...
The interest for distributed stochastic optimization has raised to train complex Machine Learning models with more data on distributed systems. Increasing the computation power speeds up the training but it faces a communication bottleneck between workers ...
The strong growth condition (SGC) is known to be a sufficient condition for linear convergence of the stochastic gradient method using a constant step-size γ (SGM-CS). In this paper, we provide a necessary condition, for the linear convergence of SGM-CS, t ...
We propose two new alternating direction methods to solve “fully” nonsmooth constrained convex problems. Our algorithms have the best known worst-case iteration-complexity guarantee under mild assumptions for both the objective residual and feasibility gap ...
The analysis in Part I [1] revealed interesting properties for subgradient learning algorithms in the context of stochastic optimization. These algorithms are used when the risk functions are non-smooth or involve non-differentiable components. They have b ...
In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral/regularized algorithms, including ridge regressi ...
We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We present the first algorithm that a ...
Generalized linear models, where a random vector x is observed through a noisy, possibly nonlinear, function of a linear transform z = A x, arise in a range of applications in nonlinear filtering and regression. Approximate message passing (AMP) methods, b ...