Publication

Identifiability and Generalizability from Multiple Experts in Inverse Reinforcement Learning

Abstract

While Reinforcement Learning (RL) aims to train an agent from a reward function in a given environment, Inverse Reinforcement Learning (IRL) seeks to recover the reward function from observing an expert’s behavior. It is well known that, in general, various reward functions can lead to the same optimal policy, and hence, IRL is ill-defined. However, [1] showed that, if we observe two or more experts with different discount factors or acting in different environments, the reward function can under certain conditions be identified up to a constant. This work starts by showing an equivalent identifiability statement from multiple experts in tabular MDPs based on a rank condition, which is easily verifiable and is shown to be also necessary. We then extend our result to various different scenarios, i.e., we characterize reward identifiability in the case where the reward function can be represented as a linear combination of given features, making it more interpretable, or when we have access to approximate transition matrices. Even when the reward is not identifiable, we provide conditions characterizing when data on multiple experts in a given environment allows to generalize and train an optimal agent in a new environment. Our theoretical results on reward identifiability and generalizability are validated in various numerical experiments.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Ontological neighbourhood
Related concepts (34)
Reinforcement learning
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in not needing labelled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected.
Reward system
The reward system (the mesocorticolimbic circuit) is a group of neural structures responsible for incentive salience (i.e., "wanting"; desire or craving for a reward and motivation), associative learning (primarily positive reinforcement and classical conditioning), and positively-valenced emotions, particularly ones involving pleasure as a core component (e.g., joy, euphoria and ecstasy). Reward is the attractive and motivational property of a stimulus that induces appetitive behavior, also known as approach behavior, and consummatory behavior.
Linear combination
In mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results (e.g. a linear combination of x and y would be any expression of the form ax + by, where a and b are constants). The concept of linear combinations is central to linear algebra and related fields of mathematics. Most of this article deals with linear combinations in the context of a vector space over a field, with some generalizations given at the end of the article.
Show more
Related publications (40)

Seeking the new, learning from the unexpected: Computational models of surprise and novelty in the brain

Alireza Modirshanechi

Human babies have a natural desire to interact with new toys and objects, through which they learn how the world around them works, e.g., that glass shatters when dropped, but a rubber ball does not. When their predictions are proven incorrect, such as whe ...
EPFL2024

The Societal and Scientific Importance of Inclusivity, Diversity, and Equity in Machine Learning for Chemistry

Daniel Probst

While the introduction of practical deep learning has driven progress across scientific fields, recent research highlighted that the requirement of deep learning for ever-increasing computational resources and data has potential negative impacts on the sci ...
2023

Optimal containment control for a class of heterogeneous multi-agent systems with actuator faults

Ju Wu, Tong Wang

This article investigates the optimal containment control problem for a class of heterogeneous multi-agent systems with time-varying actuator faults and unmatched disturbances based on adaptive dynamic programming. Since there exist unknown input signals i ...
WILEY2023
Show more
Related MOOCs (22)
Algebra (part 1)
Un MOOC francophone d'algèbre linéaire accessible à tous, enseigné de manière rigoureuse et ne nécessitant aucun prérequis.
Algebra (part 1)
Un MOOC francophone d'algèbre linéaire accessible à tous, enseigné de manière rigoureuse et ne nécessitant aucun prérequis.
Algebra (part 2)
Un MOOC francophone d'algèbre linéaire accessible à tous, enseigné de manière rigoureuse et ne nécessitant aucun prérequis.
Show more