Lecture

Relation of SARSA and Bellman equation

Description

This lecture presents a sketch of a proof regarding the relation between fluctuating Q-values in SARSA and the Bellman equation. The instructor explains the assumptions, expectations, and updates in the SARSA algorithm, emphasizing the convergence to the Bellman equation through expectations of Q-values. The proof involves modifications to the update rule, expectations of rewards, and policies, highlighting the impact of a small learning rate on the approximation of policy constancy. By considering the policy as constant during statistical averaging, the expectation values of Q-hat SA are derived, showing a connection to the Bellman equation.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.