Publication
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks
Related publications (31)
On the Generalization of Stochastic Gradient Descent with Momentum
Volkan Cevher, Kimon Antonakopoulos
While momentum-based accelerated variants of stochastic gradient descent (SGD) are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In this work, we first show that th ...
Microtome Publishing2024