**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Concept# Bellman equation

Résumé

A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the "value" of a decision problem at a certain point in time in terms of the payoff from some initial choices and the "value" of the remaining decision problem that results from those initial choices. This breaks a dynamic optimization problem into a sequence of simpler subproblems, as Bellman's “principle of optimality" prescribes. The equation applies to algebraic structures with a total ordering; for algebraic structures with a partial ordering, the generic Bellman's equation can be used.
The Bellman equation was first applied to engineering control theory and to other topics in applied mathematics, and subsequently became an important tool in economic theory; though the basic concepts of dynamic programming are prefigured in John von Neumann and Oskar Morgenstern's Theory of Games and Economic Beh

Source officielle

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Publications associées

Chargement

Personnes associées

Chargement

Unités associées

Chargement

Concepts associés

Chargement

Cours associés

Chargement

Séances de cours associées

Chargement

Publications associées (16)

Personnes associées

Aucun résultat

Chargement

Chargement

Chargement

Unités associées

Aucun résultat

Concepts associés (12)

Programmation dynamique

En informatique, la programmation dynamique est une méthode algorithmique pour résoudre des problèmes d'optimisation. Le concept a été introduit au début des années 1950 par Richard Bellman. À l'épo

Commande optimale

La théorie de la commande optimale permet de déterminer la commande d'un système qui minimise (ou maximise) un critère de performance, éventuellement sous des contraintes pouvant porter sur la command

Processus de décision markovien

En théorie de la décision et de la théorie des probabilités, un processus de décision markovien (en anglais Markov decision process, MDP) est un modèle stochastique où un agent prend des décisions et

Cours associés (20)

MGT-484: Applied probability & stochastic processes

This course focuses on dynamic models of random phenomena, and in particular, the most popular classes of such models: Markov chains and Markov decision processes. We will also study applications in queuing theory, finance, project management, etc.

CS-456: Artificial neural networks/reinforcement learning

Since 2010 approaches in deep learning have revolutionized fields as diverse as computer vision, machine learning, or artificial intelligence. This course gives a systematic introduction into influential models of deep artificial neural networks, with a focus on Reinforcement Learning.

CS-250: Algorithms

The students learn the theory and practice of basic concepts and techniques in algorithms. The course covers mathematical induction, techniques for analyzing algorithms, elementary data structures, major algorithmic paradigms such as dynamic programming, sorting and searching, and graph algorithms.

We apply a Gaussian variational approximation to model reduction in large biochemical networks of unary and binary reactions. We focus on a small subset of variables (subnetwork) of interest, e.g. because they are accessible experimentally, embedded in a larger network (bulk). The key goal is to write dynamical equations reduced to the subnetwork but still retaining the effects of the bulk. As a result, the subnetwork-reduced dynamics contains a memory term and an extrinsic noise term with non-trivial temporal correlations. We first derive expressions for this memory and noise in the linearized (Gaussian) dynamics and then use a perturbative power expansion to obtain first order nonlinear corrections. For the case of vanishing intrinsic noise, our description is explicitly shown to be equivalent to projection methods up to quadratic terms, but it is applicable also in the presence of stochastic fluctuations in the original dynamics. An example from the epidermal growth factor receptor signalling pathway is provided to probe the increased prediction accuracy and computational efficiency of our method.

Prescribing optimal operation based on the condition of the system, and thereby potentially prolonging its remaining useful lifetime, has tremendous potential in terms of actively managing the availability, maintenance, and costs of complex systems. Reinforcement learning (RL) algorithms are particularly suitable for this type of problem given their learning capabilities. A special case of a prescriptive operation is the power allocation task, which can be considered as a sequential allocation problem whereby the action space is bounded by a simplex constraint. A general continuous action-space solution of such sequential allocation problems has still remained an open research question for RL algorithms. In continuous action space, the standard Gaussian policy applied in reinforcement learning does not support simplex constraints, while the Gaussian-softmax policy introduces a bias during training. In this work, we propose the Dirichlet policy for continuous allocation tasks and analyze the bias and variance of its policy gradients. We demonstrate that the Dirichlet policy is bias-free and provides significantly faster convergence, better performance, and better robustness to hyperparameter changes as compared to the Gaussian-softmax policy. Moreover, we demonstrate the applicability of the proposed algorithm on a prescriptive operation case in which we propose the Dirichlet power allocation policy and evaluate its performance on a case study of a set of multiple lithium-ion (Li-I) battery systems. The experimental results demonstrate the potential to prescribe optimal operation, improving the efficiency and sustainability of multi-power source systems.

2022Séances de cours associées (49)

The revelation of mechanism bifurcation is essential in the design and analysis of reconfigurable mechanisms. The first- and second-order based methods have successfully revealed the bifurcation of mechanisms. However, they fail in the novel Schatz-inspired metamorphic mechanisms presented in this paper. Here, we present the third- and fourth-order based method for their bifurcation revelation using screw theory. Based on the constraint equations derived from the first- and second-order kinematics, only one linearly independent relationship between joint angular velocities at the singular configuration of the new mechanism can be generated, which means the bifurcation cannot be revealed in this way. Therefore, we calculate constraint equations from the third- and fourth-order kinematics, and attain two linearly independent relationships between joint angular accelerations at the same singular configuration that correspond to different curvatures of the kinematic curves of two motion branches in the configuration space. Moreover, motion branches in Schatz-inspired metamorphic mechanisms are demonstrated. (C) 2020 Elsevier Ltd. All rights reserved.