This lecture introduces policy gradient methods using a simple example of a single neuron with binary output, focusing on the disadvantages of Q-learning, SARSA, and TD-learning, and explaining the basic idea of policy gradient methods to optimize rewards directly.