Lecture

Reinforcement Learning: Bandit Problems

Description

This lecture covers a two-line proof of the convergence in expectation for the learning rule used in reinforcement learning with a 1-step horizon, demonstrating that the empirical estimate of the Q value converges to the real Q value.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.