, , ,
Modern machine learning architectures distinguish servers and workers. Typically, a d-dimensional model is hosted by a server and trained by n workers, using a distributed stochastic gradient descent (SGD) optimization scheme. At each SGD step, the goal is ...
Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik2021