Practical Byzantine-resilient Stochastic Gradient Descent

Sébastien Louis Alexandre Rouault
2022
EPFL thesis

Abstract

Algorithms are everywhere.The recipe for the frangipane cake is an algorithm.If all the listed ingredients are available and the cook is sufficiently deft, after a finite number of small, well-defined steps a delicious dessert will exit the oven.Now, what if the grocer who sold the cook the ingredients mislabeled some of them?If salt had been mistakenly labeled and sold as sugar, those tasting the cake would probably be disgusted and stop eating.If a dangerous drug had been maliciously sold as sugar, the consequences could be much more awful.When it comes to security, machine learning is perhaps closer to cooking than it is to other families of computer programs.Follow the recipe for the Margherita pizza using cream and onions instead of tomatoes and mozzarella, and you may pull a Flammekueche out of the oven.Likewise, what the practitioner gets at the end of the training certainly depends on the relevance of the model and other hyperparameters, but ultimately the output parameters are defined by the inputs: the training set.While for cookery we simply trust the supply chain and blame the cook for mistakes, we should expect machine learning algorithms to gracefully handle (partially) malformed and malicious datasets and inputs, as for any other computer program.Byzantine machine learning studies malicious behaviors during the training phase.A particular, growing body of work has been tackling Byzantine failures in the context of distributed Stochastic Gradient Descent (SGD).A central server distributes and safely aggregates gradients from several worker nodes, some of them being adversarial.On the theoretical side, this thesis tries to advance the state-of-the-art on two fronts.Existing works had always considered a trusted, central parameter server.We propose a novel algorithm that has the same proven guarantees as previous works, but does not require any node to be trusted and can operate under network asynchrony.Two other contributions tackle problems either in the construction or an assumption common to statistically-robust Gradient Aggregation Rule (GAR), that made them very vulnerable to attacks in practice.We put a substantial emphasis on devising pragmatic, easy to implement and computationally cheap techniques to address these issues.On the system side, this thesis implements many of the existing and contributed GARs, integrated into two major machine learning frameworks (PyTorch and TensorFlow).These algorithms are assesses from microbenchmarks on a single GPU to actual networked deployments on many nodes.The associated code has been open-sourced.

Official source

https://infoscience.epfl.ch/record/291916?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Practical Byzantine-resilient Stochastic Gradient Descent

Graph Chatbot

Chat with Graph Search

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Robust machine learning for neuroscientific inference

Machine learning-aided generative molecular design

Robust machine learning for neuroscientific inference

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Machine learning-aided generative molecular design