Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
The nonparametric learning of positive-valued functions appears widely in machine learning, especially in the context of estimating intensity functions of point processes. Yet, existing approaches either require computing expensive projections or semidefin ...
In presence of sparse noise we propose kernel regression for predicting output vectors which are smooth over a given graph. Sparse noise models the training outputs being corrupted either with missing samples or large perturbations. The presence of sparse ...
Mini-batch stochastic gradient descent (SGD) is state of the art in large scale distributed training. The scheme can reach a linear speedup with respect to the number of workers, but this is rarely seen in practice as the scheme often suffers from large ne ...
The functional linear model extends the notion of linear regression to the case where the response and covariates are iid elements of an infinite-dimensional Hilbert space. The unknown to be estimated is a Hilbert-Schmidt operator, whose inverse is by defi ...
This paper examines the mean-square error performance of diffusion stochastic algorithms under a generalized coordinate-descent scheme. In this setting, the adaptation step by each agent is limited to a random subset of the coordinates of its stochastic gr ...
At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit [12, 9], thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: ...
We study generalization properties of distributed algorithms in the setting of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We investigate distributed stochastic gradient methods (SGM), with mini-batches and multi-passes over th ...
Additive models form a widely popular class of regression models which represent the relation between covariates and response variables as the sum of low-dimensional transfer functions. Besides flexibility and accuracy, a key benefit of these models is the ...
Institute of Electrical and Electronics Engineers2017
A data-driven reduced basis (RB) method for parametrized time-dependent problems is proposed. This method requires the offline preparation of a database comprising the time history of the full-order solutions at parameter locations. Based on the full-order ...
The strong growth condition (SGC) is known to be a sufficient condition for linear convergence of the stochastic gradient method using a constant step-size γ (SGM-CS). In this paper, we provide a necessary condition, for the linear convergence of SGM-CS, t ...