Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

We propose a policy gradient algorithm for robust infinite-horizon Markov Decision Processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that display statistical optimality properties and make optimal use of limited data often fail to be rectangular. Unfortunately, the corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable. This prompts us to develop a projected Langevin dynamics algorithm tailored to the robust policy evaluation problem, which offers global optimality guarantees. We also propose a deterministic policy gradient method that solves the robust policy evaluation problem approximately, and we prove that the approximation error scales with a new measure of non-rectangularity of the uncertainty set. Numerical experiments showcase that our projected Langevin dynamics algorithm can escape local optima, while algorithms tailored to rectangular uncertainty fail to do so.

Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

Graph Chatbot

Chat with Graph Search

Data-driven IQC-Based Robust Control Design for Hybrid Micro-disturbance Isolation Platform

Probabilistic methods for neural combinatorial optimization

Reinforcement Learning for Joint Design and Control of Battery-PV Systems

Data-driven IQC-Based Robust Control Design for Hybrid Micro-disturbance Isolation Platform

Probabilistic methods for neural combinatorial optimization

Reinforcement Learning for Joint Design and Control of Battery-PV Systems