Error Feedback Fixes SignSGD and other Gradient Compression Schemes

Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge to the optimum. Further, even when it does converge, signSGD may generalize poorly when compared with SGD. These issues arise because of the biased nature of the sign compression operator. We then show that using error-feedback, i.e. incorporating the error made by the compression operator into the next step, overcomes these issues. We prove that our algorithm (EF-SGD) with arbitrary compression operator achieves the same rate of convergence as SGD without any additional assumptions. Thus EF-SGD achieves gradient compression for free. Our experiments thoroughly substantiate the theory.

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Error Feedback Fixes SignSGD and other Gradient Compression Schemes

Graph Chatbot

Chat with Graph Search

Enabling Uncertainty Estimation in Iterative Neural Networks

Fundamental Limits in Statistical Learning Problems: Block Models and Neural Networks

The JPEG AI Standard: Providing Efficient Human and Machine Visual Data Consumption

Enabling Uncertainty Estimation in Iterative Neural Networks

Fundamental Limits in Statistical Learning Problems: Block Models and Neural Networks

The JPEG AI Standard: Providing Efficient Human and Machine Visual Data Consumption