Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
This lecture introduces policy gradient methods using a simple example of a single neuron with binary output, focusing on the disadvantages of Q-learning, SARSA, and TD-learning, and explaining the basic idea of policy gradient methods to optimize rewards directly.