Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the concepts of gradient descent, convex and non-convex loss functions, stochastic gradient descent, and early stopping in the context of neural networks training. It explains the importance of small weights at the beginning of gradient descent, the impact of validation loss increase, and the norm of parameters during training. The lecture also delves into the differences between standard and stochastic gradient descent, emphasizing the computational efficiency of the latter. Various optimization techniques and strategies are discussed, including the use of ADAMW optimizer and the concept of early stopping as a form of regularization.
This video is available exclusively on Mediaspace for a restricted audience. Please log in to MediaSpace to access it if you have the necessary permissions.
Watch on Mediaspace