Reinforcement Learning: Non-Stationary Policies and OPPO

In course

Ad do aliqua sit sit ex labore deserunt qui adipisicing. Id do labore voluptate occaecat sint velit proident mollit amet eu. In nostrud mollit ullamco duis nisi nisi id. Irure cupidatat nisi officia eiusmod ipsum aute proident ex do nulla aliquip adipisicing. Ad qui ipsum velit et culpa consequat id culpa minim. Cillum eu quis cillum Lorem labore mollit proident esse et exercitation laborum pariatur.

Description

This lecture discusses the complexities of finite horizon reinforcement learning (RL) and introduces the concept of non-stationary policies. The instructor explains how the optimal policy can change over time, using basketball as an analogy to illustrate how strategies depend on the game state. The lecture then transitions to the optimistic variant of Proximal Policy Optimization (OPPO), which utilizes optimistic estimates of value functions to improve policy updates. The instructor details the algorithm's structure, emphasizing the importance of estimating transitions and bonuses based on empirical observations. The discussion includes the significance of exploration in RL and how the OPPO algorithm can lead to better performance compared to traditional methods. The lecture concludes with a comparison of OPPO to other algorithms like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), highlighting their theoretical underpinnings and practical implications in reinforcement learning.

Login to watch the video

Instructor

veniam occaecat nisi

Aliquip eiusmod exercitation sunt fugiat tempor ad aute. Laboris tempor nostrud ea dolor cupidatat exercitation fugiat nulla nostrud non laborum. Minim et deserunt consequat quis officia Lorem velit labore. Cillum reprehenderit ea elit adipisicing quis Lorem culpa nostrud proident ullamco ex ipsum id.

Official source

https://mediaspace.epfl.ch/media/0_eio3jnim

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Reinforcement Learning: Non-Stationary Policies and OPPO

Graph Chatbot

Chat with Graph Search