Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers stochastic adaptive first-order methods that converge without knowing the smoothness constant by utilizing information from stochastic gradients. It introduces variable metric stochastic gradient descent algorithms and adaptive gradient methods that locally adapt by setting the Hessian matrix based on past stochastic gradient information. The lecture also discusses AdaGrad, AcceleGrad, RMSProp, and ADAM, highlighting their properties and convergence rates. It compares various adaptive algorithms, including their performance in optimization tasks and generalization capabilities. The implications of implicit regularization in adaptive methods and the generalization performance of adaptive learning methods are also explored.