Publication

Communication-efficient distributed training of machine learning models

Thijs Vogels
2023
Thèse EPFL

Résumé

In this thesis, we explore techniques for addressing the communication bottleneck in data-parallel distributed training of deep learning models. We investigate algorithms that either reduce the size of the messages that are exchanged between workers, or that reduce the number of messages sent and received.To reduce the size of messages, we propose an algorithm for lossy compression of gradients. This algorithm is compatible with existing high-performance training pipelines based on the all-reduce primitive and leverages the natural approximate low-rank structure in gradients of neural network layers to obtain high compression rates.To reduce the number of messages, we study the decentralized learning paradigm where workers do not average their model updates all-to-all in each step of Stochastic Gradient Descent, but only communicate with a small subset of their peers. We extend the aforementioned compression algorithm to operate in this setting. We also study the influence of the communication topology on the performance of decentralized learning, highlighting shortcomings of the typical 'spectral gap' metric to measure the quality of communication topologies, and proposing a new framework for evaluating topologies. Finally, we propose an alternative communication paradigm for distributed learning over sparse topologies. This paradigm, which is based on the concept 'relaying' updates over spanning trees of the communication topology, shows benefits over the typical gossip-based approach, especially when the workers have very heterogeneous data distributions.

Source officielle

https://infoscience.epfl.ch/record/301929?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search