Lecture

Neural networks under SGD

In course

Machine learning and data analysis are becoming increasingly central in many sciences and applications. This course concentrates on the theoretical underpinnings of machine learning.

Description

This lecture covers the concept of neural networks trained using Stochastic Gradient Descent (SGD). It explains the relationship between the number of neurons in hidden layers and the number of samples, as well as the parameters involved in the network. The lecture delves into the square loss function and the process of taking small steps to optimize the network. It also discusses the evolution of particles in the network, the interpretation of true risk, and the implications of replacing particles with density. The lecture concludes with an exploration of limits and justifications in the context of neural network training.

Instructors (2)

Nicolas Macris

Nicolas Macris received the PhD degree in theoretical physics from EPFL and then pursued his scientific activity at the mathematics department of Rutgers University (NJ, USA). He then joined the Faculty of Basic Science of EPFL, working in the field of quantum statistical mechanics and mathematical aspects of the quantum Hall effect. Since 2005 he is with the Communication Theories Laboratory and Information Processing group of the School of Communication and Computer Science and currently works at the interface of statistical mechanics, information theory and error correcting codes, inference and learning theory. He held long-term visiting appointments and collaborations with the University College and the Institute of Advanced studies in Dublin, the Ecole Normale Supérieure de Lyon, the Centre de Physique Theorique Luminy Marseille, Paris XI Orsay, the ETH Zürich and more recently Los Alamos National Lab. CV and publication list.

Rüdiger Urbanke

Rüdiger L. Urbanke obtained his Dipl. Ing. degree from the Vienna University of Technology, Austria in 1990 and the M.Sc. and PhD degrees in Electrical Engineering from Washington University in St. Louis, MO, in 1992 and 1995, respectively. He held a position at the Mathematics of Communications Department at Bell Labs from 1995 till 1999 before becoming a faculty member at the School of Computer & Communication Sciences (I&C) of EPFL. He is a member of the Information Processing Group. He is principally interested in the analysis and design of iterative coding schemes, which allow reliable transmission close to theoretical limits at low complexities. Such schemes are part of most modern communications standards, including wireless transmission, optical communication and hard disk storage. More broadly, his research focuses on the analysis of graphical models and the application of methods from statistical physics to problems in communications. From 2000-2004 he was an Associate Editor of the IEEE Transactions on Information Theory and he is currently on the board of the series "Foundations and Trends in Communications and Information Theory." In 2017 he was President of the Information Theory Society. From 2009 till 2012 he was the head of the I&C doctoral school, in 2013 he served as Dean a. i. of I&C, and since 2016 he is the Associated Dean for teaching of I&C. He is a co-author of the book "Modern Coding Theory" published by Cambridge University Press. Awards: 2021 IEEE Information Theory Society Paper Award 2016 STOC Best Paper Award 2014 La Polysphere Teaching Award 2014 IEEE Hamming Medal 2013 IEEE Information Theory Society Paper Award 2011 MASCO Best Paper Award 2011 IEEE Koji Kobayashi Award 2009 La Polysphere Teaching Award 2002 IEEE Information Theory Society Paper Award Fulbright Scholarship My students have won the following awards: M. Mondelli, 2021 IEEE Information Theory Paper Award M. Mondelli, EPFL Doctorate Award 2018 M. Mondelli, Patrick Denantes Award, 2017 M. Mondelli, IEEE IT Society Student Paper Award at ISIT, 2015 M. Mondelli, Dan David Prize Scholarship, 2015 H. Hassani, Inaugural Thomas Cover Dissertation Award, 2014 S. Kudekar, 2013 & 2021 IEEE Information Theory Paper Award A. Karbasi, Patrick Denantes Award, 2013 V. Venkatesan, Best Paper Award at MASCOTS, 2011 A. Karbasi, Best Student Paper Award at ICASSP, 2011 (with R. Parhizkar) A. Karbasi, Best Student Paper Award at ACM SIGMETRICS, 2010 (with S. Oh) S. Korada, ABB Dissertation Award, 2010 S. Korada, IEEE IT Society Student Paper Award at ISIT, 2009 (with E. Sasoglu) S. Korada, IEEE IT Society Student Paper Award at ISIT, 2008

Official source

Related lectures (30)

Neural Networks: Training and Optimization

Explores the training and optimization of neural networks, addressing challenges like non-convex loss functions and local minima.

Gradient Descent: Optimization Techniques

Explores gradient descent, loss functions, and optimization techniques in neural network training.

Generalization in Deep Learning

Delves into the trade-off between model complexity and risk, generalization bounds, and the dangers of overfitting complex function classes.

Feed-forward Networks

Introduces feed-forward networks, covering neural network structure, training, activation functions, and optimization, with applications in forecasting and finance.

Multilayer Neural Networks: Deep Learning

Covers the fundamentals of multilayer neural networks and deep learning.