Publication

Surprise-based model estimation in reinforcement learning: algorithms and brain signatures

Vasiliki Liakoni
2021
EPFL thesis
Abstract

Learning how to act and adapting to unexpected changes are remarkable capabilities of humans and other animals. In the absence of a direct recipe to follow in life, behaviour is often guided by rewarding and by surprising events. A positive or a negative outcome influences the tendency to repeat some actions, and a sudden unexpected event signals the possible need to act differently or to update one's view about the world. Advances in computational, behavioral, and cognitive neuroscience have indicated that animals employ multiple strategies to learn from interaction. However, our understanding of learning strategies and how they are combined is largely restricted. The main goal of this thesis is to study the use of surprise by ever-adapting biological agents, its contributions to reward-based learning, and its manifestation in the human brain. We first study surprise from a theoretical perspective. In a probabilistic model of changing environments, we show that exact and approximate Bayesian inference give rise to a trade-off between forgetting old observations and integrating them with new ones, modulated by a naturally emerging surprise measure. We develop novel surprised-based algorithms that can adapt in the face of abrupt changes and accurately estimate the model of the world, and that could potentially be implemented in the brain. Next, we focus on the contributions of surprise-based model estimation to reinforcement learning. We couple one of our adaptive algorithms as well as simpler non-adaptive methods with reinforcement learning agents and evaluate their performance on environments exhibiting different characteristics. Abrupt changes that directly affect the agent's policy call for surprise-based adaptation, in order to achieve higher performance. Often, however, the agent does not need to invest in maintaining an accurate model of the environment to obtain high reward levels. More specifically, in stochastic environments or in environments with distal changes, simpler methods, equipped with exploration capacity, perform equally well compared to more elaborate methods. Finally, we turn to human learning behaviour and brain signals of surprise- and reward-based learning. We design a novel sequential decision making task of multiple steps where strategic use of surprising events allows us to dissociate fMRI brain correlates of reward learning and model estimation. We show that Bayesian inference on this task leads to the same surprise measure we found earlier, where the trade-off is now between ignoring new observations and integrating them with the old belief, and we develop reinforcement learning algorithms that perform outlier detection via this surprise-modulated trade-off. At the level of behaviour we find evidence for a model-free policy learning architecture, with potential influences from a model estimation system. At the level of brain responses we identify signatures of both reward- and model estimation signals, supporting the existence of multiple parallel learning systems in the brain. This thesis presents a comparative analysis of surprise-based model estimation methods in theory and simulations, provides insights in the type of approximations that biological agents may adopt, and identifies signatures of model estimation in the human brain. Our results may aid future work aiming at building efficient adaptive agents and at understanding the learning algorithms and the surprise measures implemented in the brain.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (36)
Reinforcement learning
Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in not needing labelled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected.
Deep reinforcement learning
Deep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs (e.g.
Q-learning
Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.
Show more
Related publications (94)

Seeking the new, learning from the unexpected: Computational models of surprise and novelty in the brain

Alireza Modirshanechi

Human babies have a natural desire to interact with new toys and objects, through which they learn how the world around them works, e.g., that glass shatters when dropped, but a rubber ball does not. When their predictions are proven incorrect, such as whe ...
EPFL2024

Unveiling the complexity of learning and decision-making

Wei-Hsiang Lin

Reinforcement learning (RL) is crucial for learning to adapt to new environments. In RL, the prediction error is an important component that compares the expected and actual rewards. Dopamine plays a critical role in encoding these prediction errors. In my ...
EPFL2024

Multi-agent reinforcement learning with graph convolutional neural networks for optimal bidding strategies of generation units in electricity markets

Olga Fink, Mina Montazeri

Finding optimal bidding strategies for generation units in electricity markets would result in higher profit. However, it is a challenging problem due to the system uncertainty which is due to the lack of knowledge of the strategies of other generation uni ...
PERGAMON-ELSEVIER SCIENCE LTD2023
Show more
Related MOOCs (32)
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.
Neuronal Dynamics 2- Computational Neuroscience: Neuronal Dynamics of Cognition
This course explains the mathematical and computational models that are used in the field of theoretical neuroscience to analyze the collective dynamics of thousands of interacting neurons.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.