The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Updates

We analyze (stochastic) gradient descent (SGD) with delayed updates on smooth quasi-convex and non-convex functions and derive concise, non-asymptotic, convergence rates. We show that the rate of convergence in all cases consists of two terms: (i) a stochastic term which is not affected by the delay, and (ii) a higher order deterministic term which is only linearly slowed down by the delay. Thus, in the presence of noise, the effects of the delay become negligible after a few iterations and the algorithm converges at the same optimal rate as standard SGD. This result extends a line of research that showed similar results in the asymptotic regime or for strongly-convex quadratic functions only.

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Updates

Graph Chatbot

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Topics in statistical physics of high-dimensional machine learning

Enabling Uncertainty Estimation in Iterative Neural Networks

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Topics in statistical physics of high-dimensional machine learning

Enabling Uncertainty Estimation in Iterative Neural Networks