This lecture introduces the concept of Bandit Problems in Reinforcement Learning, where one has to choose between different actions and immediately receives a reward. The slides cover topics such as one-step horizon games, Q-values, optimal policy, iterative update rules, empirical averaging, and convergence in expectation.