Lecture

Policy Gradient Methods: Multiple Time Steps

Description

This lecture covers Policy Gradient methods over multiple time steps, aiming to update policy parameters to maximize the total discounted reward. The slides present the derivation of these methods, including the calculation of accumulated rewards in episodes and the pseudo-code for the REINFORCE algorithm.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.