Publication

Distributed Value-Function Learning with Linear Convergence Rates

Related concepts (33)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Optimal control

Optimal control theory is a branch of mathematical optimization that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and operations research. For example, the dynamical system might be a spacecraft with controls corresponding to rocket thrusters, and the objective might be to reach the moon with minimum fuel expenditure.

Q-learning

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.

Business simulation

Business simulation or corporate simulation is simulation used for business training, education or analysis. It can be scenario-based or numeric-based. Most business simulations are used for business acumen training and development. Learning objectives include: strategic thinking, decision making, problem solving, financial analysis, market analysis, operations, teamwork and leadership. The business gaming community seems lately to have adopted the term business simulation game instead of just gaming or just simulation.

Deep reinforcement learning

Deep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs (e.g.

Computer simulation

Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be determined by comparing their results to the real-world outcomes they aim to predict. Computer simulations have become a useful tool for the mathematical modeling of many natural systems in physics (computational physics), astrophysics, climatology, chemistry, biology and manufacturing, as well as human systems in economics, psychology, social science, health care and engineering.

Linear–quadratic regulator

The theory of optimal control is concerned with operating a dynamic system at minimum cost. The case where the system dynamics are described by a set of linear differential equations and the cost is described by a quadratic function is called the LQ problem. One of the main results in the theory is that the solution is provided by the linear–quadratic regulator (LQR), a feedback controller whose equations are given below. LQR controllers possess inherent robustness with guaranteed gain and phase margin, and they also are part of the solution to the LQG (linear–quadratic–Gaussian) problem.

Rate of convergence

In numerical analysis, the order of convergence and the rate of convergence of a convergent sequence are quantities that represent how quickly the sequence approaches its limit. A sequence that converges to is said to have order of convergence and rate of convergence if The rate of convergence is also called the asymptotic error constant. Note that this terminology is not standardized and some authors will use rate where this article uses order (e.g., ).

Supervised learning

Supervised learning (SL) is a paradigm in machine learning where input objects (for example, a vector of predictor variables) and a desired output value (also known as human-labeled supervisory signal) train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see inductive bias).

Inflation targeting

In macroeconomics, inflation targeting is a monetary policy where a central bank follows an explicit target for the inflation rate for the medium-term and announces this inflation target to the public. The assumption is that the best that monetary policy can do to support long-term growth of the economy is to maintain price stability, and price stability is achieved by controlling inflation. The central bank uses interest rates as its main short-term monetary instrument.

Nominal income target

A nominal income target is a monetary policy target. Such targets are adopted by central banks to manage national economic activity. Nominal aggregates are not adjusted for inflation. Nominal income aggregates that can serve as targets include nominal gross domestic product (NGDP) and nominal gross domestic income (GDI). Central banks use a variety of techniques to hit their targets, including conventional tools such as interest rate targeting or open market operations, unconventional tools such as quantitative easing or interest rates on excess reserves and expectations management to hit its target.