Skip to main content
Publication

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Related publications (31)

On the Generalization of Stochastic Gradient Descent with Momentum

Volkan Cevher, Kimon Antonakopoulos

While momentum-based accelerated variants of stochastic gradient descent (SGD) are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods. In this work, we first show that th ...
Microtome Publishing2024