Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture introduces Monte-Carlo methods for reinforcement learning, which directly estimate values by averaging over empirically measured returns, contrasting them with TD-methods that exploit the Bellman equation. The lecture covers Monte-Carlo estimation, first-visit MC prediction, Monte-Carlo estimation of Q-values, and Batch-expected SARSA. It also discusses the comparison between SARSA, Monte-Carlo, and Batch-expected-SARSA learning, emphasizing the importance of the empirical Bellman equation. The lecture concludes with a comparison of Monte-Carlo versus batch-TD methods, highlighting the efficiency of TD methods in propagating information back into the graph through the 'bootstrap' step.