Dan Alistarh, Fartash Faghri
As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-para ...
MICROTOME PUBL2021