**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Markov decision process

Summary

In mathematics, a Markov decision process (MDP) is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying optimization problems solved via dynamic programming. MDPs were known at least as early as the 1950s; a core body of research on Markov decision processes resulted from Ronald Howard's 1960 book, Dynamic Programming and Markov Processes. They are used in many disciplines, including robotics, automatic control, economics and manufacturing. The name of MDPs comes from the Russian mathematician Andrey Markov as they are an extension of Markov chains.
At each time step, the process is in some state s, and the decision maker may choose any action a that is available in state s. The process responds at the next time step by randomly moving into a new state s'

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people (16)

Related concepts (13)

Game theory

Game theory is the study of mathematical models of strategic interactions among rational agents. It has applications in all fields of social science, as well as in logic, systems science and computer

Machine learning

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machin

Artificial intelligence

Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of human beings or animals. AI applications include advanced web search engines (e.g., Google

Related publications (100)

Loading

Loading

Loading

Related units (11)

Related lectures (74)

Related courses (36)

CS-430: Intelligent agents

Software agents are widely used to control physical, economic and financial processes. The course presents practical methods for implementing software agents and multi-agent systems, supported by programming exercises, and the theoretical underpinnings including computational game theory.

MGT-484: Applied probability & stochastic processes

This course focuses on dynamic models of random phenomena, and in particular, the most popular classes of such models: Markov chains and Markov decision processes. We will also study applications in queuing theory, finance, project management, etc.

COM-500: Statistical signal and data processing through applications

Building up on the basic concepts of sampling, filtering and Fourier transforms, we address stochastic modeling, spectral analysis, estimation and prediction, classification, and adaptive filtering, with an application oriented approach and hands-on numerical exercises.

The main objective of this thesis is to model a regatta in the America’s Cup, and more precisely the first leg of the race, where the two competing sailboats have to move upwind. During the race, each crew attempts to be the first to reach the end of this leg, and is allowed to hinder its opponent as long as certain rules are respected. A boat which finishes this leg first is often able to manage its lead until the end of the regatta. An essential ingredient of the problem is the study of different maneuvers that the boats can execute during the race, as well as the effect generated by the presence of a boat on the wind around it. Indeed, a boat located just behind its opponent will receive less wind in its sails, and its speed will consequently be reduced. The boats considered in our model are of the type Class America, which were in used in the America’s Cup between 1982 and 2007. The race can hence be viewed as a game between two players which are assumed to be identical, in which each of them can make decisions sequentially. The goal of each player is to finish the first upwind leg with the largest lead possible over the opponent. The boats progress in an environment in which the wind fluctuates unpredictably. Thus, the game is a sequential stochastic game, in which each player will try to determine a sequence of actions which is the most favorable on average. A mathematical study of this kind of game will be done, and a strategy will be built by using tools of dynamic programming. A theorem will prove that this strategy is optimal, in the sense that no player has interest to deviate from this strategy. The last part of this thesis consists of applying the mathematical study of sequential stochastic games in the context of a sailing regatta. Given the current positions of the boats as well as the current state of the wind, a set of available actions and reactions for the boats can be defined. Each choice will bring the boats up to some new positions, where a new decision process will begin in a new state of the wind. Once the rules of the game are established, the objective is to define an algorithm which allows to build a strategy which will be an approximation of the optimal strategy. The implementation of this algorithm could produce a tool for decision support, that the crew could use on board during the regatta.

Eli Gutin, Daniel Kuhn, Wolfram Wiesemann

In a stochastic interdiction game a proliferator aims to minimize the expected duration of a nuclear weapons development project, and an interdictor endeavors to maximize the project duration by delaying some of the project tasks. We formulate static and dynamic versions of the interdictor's decision problem where the interdiction plan is either precommitted or adapts to new information revealed over time, respectively. The static model gives rise to a stochastic program, whereas the dynamic model is formalized as a multiple optimal stopping problem in continuous time and with decision-dependent information. Under a memoryless probabilistic model for the task durations, we prove that the static model reduces to a mixed-integer linear program, whereas the dynamic model reduces to a finite Markov decision process in discrete time that can be solved via efficient value iteration. We then generalize the dynamic model to account for uncertainty in the outcomes of the interdiction actions. We also discuss a crashing game where the proliferator can use limited resources to expedite tasks so as to counterbalance the interdictor's efforts. The resulting problem can be formulated as a robust Markov decision process.

,

We consider the problem of estimating a probability distribution that maximizes the entropy while satisfying a finite number of moment constraints, possibly corrupted by noise. Based on duality of convex programming, we present a novel approximation scheme using a smoothed fast gradient method that is equipped with explicit bounds on the approximation error. We further demonstrate how the presented scheme can be used for approximating the chemical master equation through the zero-information moment closure method, and for an approximate dynamic programming approach in the context of constrained Markov decision processes with uncountable state and action spaces.

2019