Lecture

Neural networks under SGD

In course

Machine learning and data analysis are becoming increasingly central in many sciences and applications. This course concentrates on the theoretical underpinnings of machine learning.

Description

This lecture covers the optimization of neural networks using Stochastic Gradient Descent (SGD). The instructor explains the concept of dual risk versus empirical risk, the evolution of sparsity, and the speed and direction of the gradient flow. The lecture delves into the relationship between the gradient and speed, the discretization of equations, and the divergence in the context of neural networks.

Instructors (2)

Rüdiger Urbanke

Rüdiger L. Urbanke obtained his Dipl. Ing. degree from the Vienna University of Technology, Austria in 1990 and the M.Sc. and PhD degrees in Electrical Engineering from Washington University in St. Louis, MO, in 1992 and 1995, respectively. He held a position at the Mathematics of Communications Department at Bell Labs from 1995 till 1999 before becoming a faculty member at the School of Computer & Communication Sciences (I&C) of EPFL. He is a member of the Information Processing Group. He is principally interested in the analysis and design of iterative coding schemes, which allow reliable transmission close to theoretical limits at low complexities. Such schemes are part of most modern communications standards, including wireless transmission, optical communication and hard disk storage. More broadly, his research focuses on the analysis of graphical models and the application of methods from statistical physics to problems in communications. From 2000-2004 he was an Associate Editor of the IEEE Transactions on Information Theory and he is currently on the board of the series "Foundations and Trends in Communications and Information Theory." In 2017 he was President of the Information Theory Society. From 2009 till 2012 he was the head of the I&C doctoral school, in 2013 he served as Dean a. i. of I&C, and since 2016 he is the Associated Dean for teaching of I&C. He is a co-author of the book "Modern Coding Theory" published by Cambridge University Press. Awards: 2021 IEEE Information Theory Society Paper Award 2016 STOC Best Paper Award 2014 La Polysphere Teaching Award 2014 IEEE Hamming Medal 2013 IEEE Information Theory Society Paper Award 2011 MASCO Best Paper Award 2011 IEEE Koji Kobayashi Award 2009 La Polysphere Teaching Award 2002 IEEE Information Theory Society Paper Award Fulbright Scholarship My students have won the following awards: M. Mondelli, 2021 IEEE Information Theory Paper Award M. Mondelli, EPFL Doctorate Award 2018 M. Mondelli, Patrick Denantes Award, 2017 M. Mondelli, IEEE IT Society Student Paper Award at ISIT, 2015 M. Mondelli, Dan David Prize Scholarship, 2015 H. Hassani, Inaugural Thomas Cover Dissertation Award, 2014 S. Kudekar, 2013 & 2021 IEEE Information Theory Paper Award A. Karbasi, Patrick Denantes Award, 2013 V. Venkatesan, Best Paper Award at MASCOTS, 2011 A. Karbasi, Best Student Paper Award at ICASSP, 2011 (with R. Parhizkar) A. Karbasi, Best Student Paper Award at ACM SIGMETRICS, 2010 (with S. Oh) S. Korada, ABB Dissertation Award, 2010 S. Korada, IEEE IT Society Student Paper Award at ISIT, 2009 (with E. Sasoglu) S. Korada, IEEE IT Society Student Paper Award at ISIT, 2008

Official source

Related lectures (27)

Untitled

Vector Calculus Review: Maxwell Equations

Covers a review of vector calculus and the Maxwell equations in electromagnetism.

Vector Analysis: Scalar Fields

Covers the analysis of scalar fields, including divergence, gradient, and Laplacian.

Vector Fields: Gradient and Divergence

Covers vector fields, gradient, divergence, heat flux, and stress tensors.

Neural Networks: Training and Optimization

Explores the training and optimization of neural networks, addressing challenges like non-convex loss functions and local minima.