Byzantine Fault-Tolerant Distributed Machine Learning with Norm-Based Comparative Gradient Elimination

This paper considers the Byzantine fault-tolerance problem in distributed stochastic gradient descent (D-SGD) method - a popular algorithm for distributed multi-agent machine learning. In this problem, each agent samples data points independently from a certain data-generating distribution. In the fault-free case, the D-SGD method allows all the agents to learn a mathematical model best fitting the data collectively sampled by all agents. We consider the case when a fraction of agents may be Byzantine faulty. Such faulty agents may not follow a prescribed algorithm correctly, and may render traditional D-SGD method ineffective by sharing arbitrary incorrect stochastic gradients. We propose a norm-based gradient-filter, named comparative gradient elimination (CGE), that robustilies the D-SGD method against Byzantine agents. We show that the CGE gradient-filter guarantees fault-tolerance against a bounded fraction of Byzantine agents under standard stochastic assumptions, and is computationally simpler compared to many existing gradient-filters such as multi-KRUM, geometric median-of-means, and the spectral filters. We empirically show, by simulating distributed learning on neural networks, that the fault-tolerance of CGE is comparable to that of existing gradient-filters. We also empirically show that exponential averaging of stochastic gradients improves the fault-tolerance of a generic gradient-filter.

Byzantine Fault-Tolerant Distributed Machine Learning with Norm-Based Comparative Gradient Elimination

Graph Chatbot

Chattez avec Graph Search

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning

Generalization of Scaled Deep ResNets in the Mean-Field Regime

Deep Learning Theory Through the Lens of Diagonal Linear Networks

Generalization of Scaled Deep ResNets in the Mean-Field Regime

Optimization Algorithms for Decentralized, Distributed and Collaborative Machine Learning