This lecture covers distributionally robust optimization, focusing on minimizing the loss function under distribution shifts. It discusses scalable algorithms like ERM and SGD, and introduces the concept of CVaR at a certain level. The lecture explores biased and unbiased gradient estimators, acceleration techniques, and complexity bounds. It also delves into variance bounds, multilevel gradient estimators, and the application of DRO to various problems. The instructor presents experimental results on linear classifiers and heterogeneous data distributions, highlighting generalization performance and open research questions.