Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The problem of Byzantine resilience in distributed machine learning, a.k.a., Byzantine machine learning, consists in designing distributed algorithms that can train an accurate model despite the presence of Byzantine nodes, i.e., nodes with corrupt data or machines that can misbehave arbitrarily. By now, many solutions to this important problem have been proposed, most of which build upon the classical stochastic gradient descent (SGD) scheme. Yet, the literature lacks a unified structure of this emerging field. Consequently, the general understanding on the principles of Byzantine machine learning remains poor. This paper addresses this issue by presenting a primer on Byzantine machine learning. In particular, we introduce three pillars of Byzantine machine learning, namely the concepts of breakdown point, robustness and gradient complexity, to curate the efficacy of a solution. The introduced systematization enables us to (i) bring forth the merits and limitations of the state-of-the-art solutions, and (ii) pave a clear path for future advancements in this field.