This lecture covers the challenges of continuous-state reinforcement learning, such as the curse of dimensionality and the need for function approximation to estimate the value function. It explains how to learn the value function using Monte-Carlo and Temporal Difference methods, and how to update the value function through roll-outs. The lecture also delves into function approximation for the value function, providing examples of parametrizing the value function and choosing features. It discusses the transition from the value function to the policy, and introduces policy gradients as an alternative approach. Additionally, it explores Policy Gradients and Policy learning by Weighted Exploration with the Returns (POWER) for reinforcement learning, including human demonstration for imitation learning. The lecture concludes with examples of reinforcement learning policies after multiple trials.
This video is available exclusively on Mediaspace for a restricted audience. Please log in to MediaSpace to access it if you have the necessary permissions.
Watch on Mediaspace