Lecture

Adaptive Gradient Methods: Theory and Applications

In course

EE-556: Mathematics of data: from theory to computation

This course provides an overview of key advances in continuous optimization and statistical analysis for machine learning. We review recent learning formulations and models as well as their guarantees

Description

This lecture covers adaptive gradient methods such as AdaGrad, RMSProp, AcceleGrad, and ADAM, explaining their adaptation strategies, step-size adjustments, and convergence properties. It also discusses the implicit regularization of these methods, their generalization performance, and their comparison with traditional optimization algorithms. The presentation concludes with insights into neural network architectures and the ongoing research in optimization methods.

Instructor

Volkan Cevher

Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park from 2006-2007 and also with Rice University in Houston, TX, from 2008-2009. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University. His research interests include machine learning, signal processing theory, optimization theory and methods, and information theory. Dr. Cevher is an ELLIS fellow and was the recipient of the Google Faculty Research award in 2018, the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.

Official source

Related lectures (31)

Adaptive Optimization Methods: Theory and Applications

Explores adaptive optimization methods that adapt locally and converge without knowing the smoothness constant.

Feed-forward Networks

Introduces feed-forward networks, covering neural network structure, training, activation functions, and optimization, with applications in forecasting and finance.

Neural Networks Optimization

Explores neural networks optimization, including backpropagation, batch normalization, weight initialization, and hyperparameter search strategies.

Structures in Non-Convex Optimization

Covers non-convex optimization, deep learning training problems, stochastic gradient descent, adaptive methods, and neural network architectures.

Generalization in Deep Learning

Delves into the trade-off between model complexity and risk, generalization bounds, and the dangers of overfitting complex function classes.