Policy Gradient Methods in Reinforcement Learning

In course

Quis consequat in dolor duis ullamco aliqua nulla. Ut amet voluptate ut anim exercitation adipisicing consectetur qui irure ut. Elit tempor consectetur id dolor consequat elit amet commodo aliquip.

Description

This lecture focuses on policy gradient methods within the context of reinforcement learning. It begins with an overview of reinforcement learning approaches, contrasting value-based and policy-based methods. The instructor discusses the optimization formulation for policy-based methods, emphasizing the importance of parameterizing policies for both discrete and continuous actions. Various parameterization techniques, including softmax and neural networks, are introduced. The lecture then delves into the policy gradient method, explaining how to compute gradients using stochastic estimates and the significance of unbiased gradient estimators. The instructor highlights the challenges of high variance in policy gradient methods and introduces techniques to reduce this variance, such as using baseline functions. The lecture concludes with practical examples, including the application of policy gradient methods to the cartpole problem, illustrating how these methods can effectively learn to balance the pole. Overall, the lecture provides a comprehensive understanding of policy gradient methods and their applications in reinforcement learning.

Login to watch the video

Instructor

ut excepteur

Labore proident ullamco velit ipsum sit aute. Et occaecat fugiat sit excepteur exercitation minim id elit est cillum cillum nisi Lorem non. Do qui cillum quis sint sint eu adipisicing pariatur incididunt nisi elit excepteur velit. Deserunt aute adipisicing aliquip velit. Deserunt ea cillum ex esse fugiat ex amet est mollit veniam incididunt deserunt officia.

Official source

https://mediaspace.epfl.ch/media/0_it7q6h0a

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Policy Gradient Methods in Reinforcement Learning

Graph Chatbot

Chat with Graph Search