Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Learning how to act and adapting to unexpected changes are remarkable capabilities of humans and other animals. In the absence of a direct recipe to follow in life, behaviour is often guided by rewarding and by surprising events. A positive or a negative outcome influences the tendency to repeat some actions, and a sudden unexpected event signals the possible need to act differently or to update one's view about the world. Advances in computational, behavioral, and cognitive neuroscience have indicated that animals employ multiple strategies to learn from interaction. However, our understanding of learning strategies and how they are combined is largely restricted. The main goal of this thesis is to study the use of surprise by ever-adapting biological agents, its contributions to reward-based learning, and its manifestation in the human brain. We first study surprise from a theoretical perspective. In a probabilistic model of changing environments, we show that exact and approximate Bayesian inference give rise to a trade-off between forgetting old observations and integrating them with new ones, modulated by a naturally emerging surprise measure. We develop novel surprised-based algorithms that can adapt in the face of abrupt changes and accurately estimate the model of the world, and that could potentially be implemented in the brain. Next, we focus on the contributions of surprise-based model estimation to reinforcement learning. We couple one of our adaptive algorithms as well as simpler non-adaptive methods with reinforcement learning agents and evaluate their performance on environments exhibiting different characteristics. Abrupt changes that directly affect the agent's policy call for surprise-based adaptation, in order to achieve higher performance. Often, however, the agent does not need to invest in maintaining an accurate model of the environment to obtain high reward levels. More specifically, in stochastic environments or in environments with distal changes, simpler methods, equipped with exploration capacity, perform equally well compared to more elaborate methods. Finally, we turn to human learning behaviour and brain signals of surprise- and reward-based learning. We design a novel sequential decision making task of multiple steps where strategic use of surprising events allows us to dissociate fMRI brain correlates of reward learning and model estimation. We show that Bayesian inference on this task leads to the same surprise measure we found earlier, where the trade-off is now between ignoring new observations and integrating them with the old belief, and we develop reinforcement learning algorithms that perform outlier detection via this surprise-modulated trade-off. At the level of behaviour we find evidence for a model-free policy learning architecture, with potential influences from a model estimation system. At the level of brain responses we identify signatures of both reward- and model estimation signals, supporting the existence of multiple parallel learning systems in the brain. This thesis presents a comparative analysis of surprise-based model estimation methods in theory and simulations, provides insights in the type of approximations that biological agents may adopt, and identifies signatures of model estimation in the human brain. Our results may aid future work aiming at building efficient adaptive agents and at understanding the learning algorithms and the surprise measures implemented in the brain.
,