Lecture

Proximal Policy Optimization for Continuous Control

Related lectures (32)
Subtracting the mean reward via the value function
Covers the significance of subtracting the mean reward in policy gradient methods for deep reinforcement learning, reducing noise in the stochastic gradient.
Deep Learning for Autonomous Vehicles
Explores deep learning for autonomous vehicles, covering perception, action, and social forecasting in the context of sensor technologies and ethical considerations.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.