Lecture

Learning Agents: Exploration-Exploitation Tradeoff

Description

This lecture covers the implementation of reactive agents that learn from observations, focusing on the exploration-exploitation tradeoff in learning unknown effects of actions. It discusses scenarios where an adversary can influence the world and techniques to develop robust strategies. Topics include multi-armed bandits, Q-learning, contextual bandits, and strategies like epsilon-greedy, Thompson sampling, and regret matching. The lecture also explores the challenges of learning with state transitions and the use of deep Q-learning and experience replay to address them.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.