Explores optimization methods like gradient descent and subgradients for training machine learning models, including advanced techniques like Adam optimization.
Covers the concept of gradient descent in scalar cases, focusing on finding the minimum of a function by iteratively moving in the direction of the negative gradient.
Discusses Stochastic Gradient Descent and its application in non-convex optimization, focusing on convergence rates and challenges in machine learning.