Byzantine machine learning

The present invention concerns computer-implemented methods for training a machine learning model using Stochastic Gradient Descent, SGD. In one embodiment, the method is performed by a first computer in a distributed computing environment and comprises performing a learning round, comprising broadcasting a parameter vector to a plurality of worker computers in the distributed computing environment, and upon receipt of one or more respective estimate vectors from a subset of the worker computers, determining an updated parameter vector for use in a next learning round based on the one or more received estimate vectors, wherein the determining comprises ignoring an estimate vector received from a given worker computer when a sending frequency of the given worker computer is above a threshold value. The method aggregates the gradients in an asynchronous communication model with unbounded communication delays.

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function

Patrick Thiran, Negar Kiyavash, Saber Salehkaleybar

We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of functions satisfying gradient dominance property with

1\le\alpha\le2

which holds in a wide range of applications in machine learning and signal processing. This conditio ...

NeurIPS2022

Graph Chatbot

Chat with Graph Search

On the Generalization of Stochastic Gradient Descent with Momentum

Universal and adaptive methods for robust stochastic optimization

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function

On the Generalization of Stochastic Gradient Descent with Momentum

Stochastic Second-Order Methods Improve Best-Known Sample Complexity of SGD for Gradient-Dominated Function

Universal and adaptive methods for robust stochastic optimization