A Large Deviations Perspective on Policy Gradient Algorithms

Daniel Kuhn, Mengmeng Li, Wouter Jongeneel
2024

Abstract

Motivated by policy gradient methods in the context of reinforcement learning, we derive the first large deviation rate function for the iterates generated by stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-Łojasiewicz condition. Leveraging the contraction principle from large deviations theory, we illustrate the potential of this result by showing how convergence properties of policy gradient with a softmax parametrization and an entropy regularized objective can be naturally extended to a wide spectrum of other policy parametrizations.

Official source

https://infoscience.epfl.ch/entities/publication/a3c51d35-59a3-43c7-8885-50c6c30bdb41

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.