Lecture

Structures in Non-Convex Optimization

In course

EE-556: Mathematics of data: from theory to computation

This course provides an overview of key advances in continuous optimization and statistical analysis for machine learning. We review recent learning formulations and models as well as their guarantees

Description

This lecture covers scalable non-convex optimization with an emphasis on deep learning, discussing the optimization formulation for deep-learning training problems, barriers to training neural networks, and the convergence, avoidance, and speed of stochastic gradient descent. It also explores stochastic adaptive first-order methods, variable metric stochastic gradient descent, and adaptive gradient methods like AdaGrad, RMSProp, and ADAM. The lecture delves into the properties and convergence of AcceleGrad, AmsGrad, and AcceleGrad, comparing them with traditional optimization algorithms. It also examines the performance of optimization algorithms in non-convex scenarios, the implicit regularization of adaptive methods, and explicit regularization through e-stability. The lecture concludes with a discussion on generalization performance and neural network architectures.

Instructor

Volkan Cevher

Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park from 2006-2007 and also with Rice University in Houston, TX, from 2008-2009. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University. His research interests include machine learning, signal processing theory, optimization theory and methods, and information theory. Dr. Cevher is an ELLIS fellow and was the recipient of the Google Faculty Research award in 2018, the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.

Official source

Ontological neighbourhood

Information engineering

Machine learning: Artificial neural networks

Related lectures (30)

The Hidden Convex Optimization Landscape of Deep Neural Networks

Explores the hidden convex optimization landscape of deep neural networks, showcasing the transition from non-convex to convex models.

Optimality of Convergence Rates: Accelerated/Stochastic Gradient Descent

Covers the optimality of convergence rates in accelerated and stochastic gradient descent methods for non-convex optimization problems.

Neural Networks: Training and Optimization

Explores the training and optimization of neural networks, addressing challenges like non-convex loss functions and local minima.

Structures in Non-Convex Optimization

Delves into structures in non-convex optimization, emphasizing scalable optimization for deep learning.

Structures in Non-Convex Optimization

Explores non-convex optimization in deep learning, covering critical points, SGD convergence, saddle points, and adaptive gradient methods.