Mean field for Markov Decision Processes: from Discrete to Continuous Optimization

We study the convergence of Markov Decision Processes made of a large number of objects to optimization problems on ordinary differential equations (ODE). We show that the optimal reward of such a Markov Decision Process, satisfying a Bellman equation, converges to the solution of a continuous Hamilton-Jacobi-Bellman (HJB) equation based on the mean field approximation of the Markov Decision Process. We give bounds on the difference of the rewards, and a constructive algorithm for deriving an approximating solution to the Markov Decision Process from a solution of the HJB equations. We illustrate the method on three examples pertaining respectively to investment strategies, population dynamics control and scheduling in queues are developed. They are used to illustrate and justify the construction of the controlled ODE and to show the gain obtained by solving a continuous HJB equation rather than a large discrete Bellman equation.

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Mean field for Markov Decision Processes: from Discrete to Continuous Optimization

Graph Chatbot

Chattez avec Graph Search

Multi-robot task allocation for safe planning against stochastic hazard dynamics

Optimal containment control for a class of heterogeneous multi-agent systems with actuator faults

A Planner-Trader Decomposition for Multi-Market Hydro Scheduling

Optimal containment control for a class of heterogeneous multi-agent systems with actuator faults

Multi-robot task allocation for safe planning against stochastic hazard dynamics

A Planner-Trader Decomposition for Multi-Market Hydro Scheduling