Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture covers the concept of Q-Learning, which involves finding the optimal policy by iteratively updating a Q-table based on rewards. It explains how to represent the Q-table, define the cost function, and learn the optimal Q-values using gradient descent. The lecture also delves into Deep Q-Learning, where a neural network approximates the Q-values, and explores the challenges of dealing with large state spaces in games like Atari. Additionally, it discusses the REINFORCE algorithm for policy gradient methods and Monte-Carlo Tree Search for decision-making. The presentation concludes with a glimpse into AlphaGo Zero, a milestone in reinforcement learning. Various concepts such as Bellman equation, value networks, and policy networks are elucidated.