Explores bug-finding, verification, and the use of learning-aided approaches in program reasoning, showcasing examples like the Heartbleed bug and differential Bayesian reasoning.
Covers deep reinforcement learning techniques for continuous control, focusing on proximal policy optimization methods and their advantages over standard policy gradient approaches.
Covers model-free prediction methods in reinforcement learning, focusing on Monte Carlo and Temporal Differences for estimating value functions without transition dynamics knowledge.