Lecture

Subtracting the mean reward via the value function

Description

This lecture explains the importance of subtracting the mean reward in policy gradient methods for deep reinforcement learning. It covers topics such as the log-likelihood trick, online gradient rules for one-step and multi-step horizons, learning value functions, and the use of baselines. The instructor also discusses the REINFORCE algorithm with a baseline, the variance reduction achieved by subtracting the mean, and the outlook on deep reinforcement learning with alpha-zero networks.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.