Publication

Rollout sampling approximate policy iteration

Related publications (37)

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning

Rachid Guerraoui, El Mahdi El Mhamdi, Alexandre David Olivier Maurer, Hadrien Hendrikx

In reinforcement learning, agents learn by performing actions and observing their outcomes. Sometimes, it is desirable for a human operator to \textit{interrupt} an agent in order to prevent dangerous situations from happening. Yet, as part of their learni ...

EPFL2017

Exploring Feature Dimensions to Learn a New Policy in an Uninformed Reinforcement Learning Task

Oh-Hyeon Choung

When making a choice with limited information, we explore new features through trial-and-error to learn how they are related. However, few studies have investigated exploratory behaviour when information is limited. In this study, we address, at both the b ...

Springer Nature2017

Reinforcement learning: the effect of environment

Michael Herzog, He Xu

Reinforcement learning is a type of supervised learning, where reward is sparse and delayed. For example in chess, a series of moves is made until a sparse reward (win, loss) is issued, which makes it impossible to evaluate the value of a single move. Stil ...

2016

On the policy space of smart specialization strategies

Dominique Foray

This paper is about smart specialization strategies' as an innovation (or industrial) policy approach. Being a sector non-neutral policy, while promoting a bottom-up principle of entrepreneurial initiative and dynamics, smart specialization strategies' occ ...

2016

Integrated Transport and Land Use Modeling for Sustainable Cities

Michel Bierlaire, Ricardo Daniel Hurtubia González

Integrated transport and land use models are an increasingly used tool for evaluation of urban policy and large scale projects. Although there is a well-built theoretical background supporting the existing models, there are few exhaustive descriptions of t ...

EPFL Press2015

Learning non-parametric basis independent models from point queries via low-rank methods

Volkan Cevher, Hemant Tyagi

We consider the problem of learning multi-ridge functions of the form f (x) = g(Ax) from point evaluations of f. We assume that the function f is defined on an l(2)-ball in R-d, g is twice continuously differentiable almost everywhere, and A is an element ...

Elsevier2014

From smart specialisation to smart specialisation policy

Dominique Foray

Purpose – The purpose of this paper is to focus on the distinction between smart specialisation and smart specialisation policy and it studies under what conditions a smart specialisation policy is necessary. Design/methodology/approach – A conceptual fram ...

2014

Robust Markov Decision Processes

Daniel Kuhn, Wolfram Wiesemann

Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use because of their sensitivity to distributional model parameters, which are typically unkn ...

2013

Autonomous reinforcement learning with experience replay

Ajay Kumar Tanwani

This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the ...

Pergamon-Elsevier Science Ltd2013

Iterative Learning of Feed-Forward Corrections for High-Performance Tracking

We revisit a recently developed iterative learning algorithm that enables systems to learn from a repeated operation with the goal of achieving high tracking performance of a given trajectory. The learning scheme is based on a coarse dynamics model of the ...

2012