Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers Stochastic Gradient Descent (SGD) with Averaging, comparing it with Gradient Descent (GD). It explains the motivation for using averaging to reduce oscillation effects in optimization problems. The lecture also discusses different types of averaging and their impact on convergence rates. Additionally, it explores the application of SGD in large-scale optimization problems and the advantages of using variants like Mini-batch SGD and SGD with Momentum. The lecture delves into the challenges of non-convex stochastic optimization and the performance of SGD in such scenarios. It concludes with a discussion on sparse recovery techniques and the Lasso optimization method for solving non-smooth convex minimization problems.