This lecture introduces variations of the SARSA algorithm, focusing on expected SARSA and Q learning. Expected SARSA updates the policy by averaging over possible next actions, while Q learning updates the policy by considering the maximum possible action. The instructor explains the differences between these variations and how they impact the learning process.