Explores optimization methods like gradient descent and subgradients for training machine learning models, including advanced techniques like Adam optimization.
Explores Stochastic Gradient Descent with Averaging, comparing it with Gradient Descent, and discusses challenges in non-convex optimization and sparse recovery techniques.