NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods.

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

Graph Chatbot

Chattez avec Graph Search

Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression

Communication-efficient distributed training of machine learning models

Evaluating the effect of sparse convolutions on point cloud compression

Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression

Communication-efficient distributed training of machine learning models

Evaluating the effect of sparse convolutions on point cloud compression