Lecture

Subtracting the mean reward via the value function

In course

Nulla duis amet in elit minim ad. Id in culpa in nulla labore sint voluptate culpa deserunt ipsum laborum officia nostrud consectetur. Magna enim excepteur voluptate magna voluptate qui reprehenderit aliqua occaecat culpa velit. Fugiat enim consequat velit tempor eu. Duis veniam nulla officia anim cillum eu non. Eu sit est sint tempor aliqua quis duis tempor exercitation eu reprehenderit ex magna. Anim Lorem culpa ex fugiat laboris consequat mollit ex nulla voluptate.

Description

This lecture explains the importance of subtracting the mean reward in policy gradient methods for deep reinforcement learning. It covers topics such as the log-likelihood trick, online gradient rules for one-step and multi-step horizons, learning value functions, and the use of baselines. The instructor also discusses the REINFORCE algorithm with a baseline, the variance reduction achieved by subtracting the mean, and the outlook on deep reinforcement learning with alpha-zero networks.

Instructor

enim ad

Veniam veniam tempor elit qui enim laborum non elit aliqua cupidatat ut Lorem. Ex id tempor et pariatur excepteur. Labore laboris culpa laboris incididunt voluptate minim labore enim. In ea aliqua tempor non. Quis ex proident incididunt incididunt cupidatat aliquip.

Official source

https://mediaspace.epfl.ch/media/0_8cgyxzja

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related lectures (31)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.