Lecture

Adaptive Optimization Methods: Theory and Applications

In course

EE-556: Mathematics of data: from theory to computation

This course provides an overview of key advances in continuous optimization and statistical analysis for machine learning. We review recent learning formulations and models as well as their guarantees

Description

This lecture covers stochastic adaptive first-order methods that converge without knowing the smoothness constant by utilizing information from stochastic gradients. It introduces variable metric stochastic gradient descent algorithms and adaptive gradient methods that locally adapt by setting the Hessian matrix based on past stochastic gradient information. The lecture also discusses AdaGrad, AcceleGrad, RMSProp, and ADAM, highlighting their properties and convergence rates. It compares various adaptive algorithms, including their performance in optimization tasks and generalization capabilities. The implications of implicit regularization in adaptive methods and the generalization performance of adaptive learning methods are also explored.

Instructor

Volkan Cevher

Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park from 2006-2007 and also with Rice University in Houston, TX, from 2008-2009. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University. His research interests include machine learning, signal processing theory, optimization theory and methods, and information theory. Dr. Cevher is an ELLIS fellow and was the recipient of the Google Faculty Research award in 2018, the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.

Official source

Related lectures (30)

Adaptive Gradient Methods: Theory and Applications

Explores adaptive gradient methods, their properties, convergence, and comparison with traditional optimization algorithms.

Feed-forward Networks

Introduces feed-forward networks, covering neural network structure, training, activation functions, and optimization, with applications in forecasting and finance.

Structures in Non-Convex Optimization

Covers non-convex optimization, deep learning training problems, stochastic gradient descent, adaptive methods, and neural network architectures.

Neural Networks: Training and Optimization

Explores the training and optimization of neural networks, addressing challenges like non-convex loss functions and local minima.

Neural Networks Optimization

Explores neural networks optimization, including backpropagation, batch normalization, weight initialization, and hyperparameter search strategies.