Partially observable Markov decision process

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a sensor model (the probability distribution of different observations given the underlying state) and the underlying MDP. Unlike the policy function in MDP which maps the underlying states to the actions, POMDP's policy is a mapping from the history of observations (or belief states) to the actions. The POMDP framework is general enough to model a variety of real-world sequential decision processes. Applications include robot navigation problems, machine maintenance, and planning under uncertainty in general. The general framework of Markov decision processes with imperfect information was described by Karl Johan Åström in 1965 in the case of a discrete state space, and it was further studied in the operations research community where the acronym POMDP was coined. It was later adapted for problems in artificial intelligence and automated planning by Leslie P. Kaelbling and Michael L. Littman. An exact solution to a POMDP yields the optimal action for each possible belief over the world states. The optimal action maximizes the expected reward (or minimizes the cost) of the agent over a possibly infinite horizon. The sequence of optimal actions is known as the optimal policy of the agent for interacting with its environment. A discrete-time POMDP models the relationship between an agent and its environment. Formally, a POMDP is a 7-tuple , where is a set of states, is a set of actions, is a set of conditional transition probabilities between states, is the reward function. is a set of observations, is a set of conditional observation probabilities, and is the discount factor. At each time period, the environment is in some state . The agent takes an action , which causes the environment to transition to state with probability .

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (1)

Related people (3)

Related lectures (4)

Measurement Scheduling for Soil Moisture Sensing: From Physical Models to Optimal Control

David Shuman, Aditya Mahajan, Ke Li

In this paper, we consider the problem of monitoring soil moisture evolution using a wireless network of in situ sensors. Continuously sampling moisture levels with these sensors incurs high-maintenan

2010

Optional Stopping Theorem: Martingales and Stepping Times

Explores the optional stopping theorem for martingales and stepping times, emphasizing its applications and implications.

Pseudo Observations: Update and Error Analysis

Covers the concept of pseudo observations and their application in updating measurements.

Poisson Problem: Fourier Transform Approach

Explores solving the Poisson problem using Fourier transform, discussing source terms, boundary conditions, and solution uniqueness.