Lecture

Reinforcement Learning: Q-Learning

Description

This lecture covers the concept of Q-Learning, which involves finding the optimal policy by iteratively updating a Q-table based on rewards. It explains how to represent the Q-table, define the cost function, and learn the optimal Q-values using gradient descent. The lecture also delves into Deep Q-Learning, where a neural network approximates the Q-values, and explores the challenges of dealing with large state spaces in games like Atari. Additionally, it discusses the REINFORCE algorithm for policy gradient methods and Monte-Carlo Tree Search for decision-making. The presentation concludes with a glimpse into AlphaGo Zero, a milestone in reinforcement learning. Various concepts such as Bellman equation, value networks, and policy networks are elucidated.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.