Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture presents a sketch of a proof regarding the relation between fluctuating Q-values in SARSA and the Bellman equation. The instructor explains the assumptions, expectations, and updates in the SARSA algorithm, emphasizing the convergence to the Bellman equation through expectations of Q-values. The proof involves modifications to the update rule, expectations of rewards, and policies, highlighting the impact of a small learning rate on the approximation of policy constancy. By considering the policy as constant during statistical averaging, the expectation values of Q-hat SA are derived, showing a connection to the Bellman equation.