This lecture covers the implementation of reactive agents that learn from observations, focusing on the exploration-exploitation tradeoff in learning unknown effects of actions. It discusses scenarios where an adversary can influence the world and techniques to develop robust strategies. Topics include multi-armed bandits, Q-learning, contextual bandits, and strategies like epsilon-greedy, Thompson sampling, and regret matching. The lecture also explores the challenges of learning with state transitions and the use of deep Q-learning and experience replay to address them.