Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

We propose a policy gradient algorithm for robust infinite-horizon Markov Decision Processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that display statistical optimality properties and make optimal use of limited data often fail to be rectangular. Unfortunately, the corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable. This prompts us to develop a projected Langevin dynamics algorithm tailored to the robust policy evaluation problem, which offers global optimality guarantees. We also propose a deterministic policy gradient method that solves the robust policy evaluation problem approximately, and we prove that the approximation error scales with a new measure of non-rectangularity of the uncertainty set. Numerical experiments showcase that our projected Langevin dynamics algorithm can escape local optima, while algorithms tailored to rectangular uncertainty fail to do so.

Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

Graph Chatbot

Chattez avec Graph Search

Quantifying the Unknown: Data-Driven Approaches and Applications in Energy Systems

An Iterative Adaptive Dynamic Programming Approach for Macroscopic Fundamental Diagram-Based Perimeter Control and Route Guidance

Design of an Open-Loop Pile-Oscillation Program in the CROCUS Reactor

Quantifying the Unknown: Data-Driven Approaches and Applications in Energy Systems

An Iterative Adaptive Dynamic Programming Approach for Macroscopic Fundamental Diagram-Based Perimeter Control and Route Guidance

Design of an Open-Loop Pile-Oscillation Program in the CROCUS Reactor