Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture explains the importance of subtracting the mean reward in policy gradient methods for deep reinforcement learning. It covers topics such as the log-likelihood trick, online gradient rules for one-step and multi-step horizons, learning value functions, and the use of baselines. The instructor also discusses the REINFORCE algorithm with a baseline, the variance reduction achieved by subtracting the mean, and the outlook on deep reinforcement learning with alpha-zero networks.