This lecture covers the theory of Reinforcement Learning, the Exploration/Exploitation dilemma, Temporal Difference Learning, and Eligibility Traces, focusing on updating previous action values along the trajectory. The SARSA algorithm is presented, along with its initialization and update rules. Additional reading material is recommended for further understanding.