One-shot learning and eligibility traces in sequential decision making

Marco Philipp Lehmann
2018
EPFL thesis

Abstract

When humans or animals perform an action that led to a desired outcome, they show a tendency to repeat it. The mechanisms underlying learning from past experience and adapting future behavior are still not fully understood. In this thesis, I study how humans learn from sparse and delayed reward during multi-step tasks. Learning a sequence of multiple decisions, from a reward obtained only at the end of the sequence, requires a mechanism to link earlier actions to later reward. The theory of reinforce- ment learning suggests an algorithmic solution to this problem, namely, to keep a decaying memory of the state-action history. Such memories are called eligibility traces. They bridge the temporal delay between the moment an action is taken and a subsequent reward. We ask whether humans make use of eligibility traces when learning a sequential decision making task. The difficulty in answering this question is that different competing algorithmic solu- tions make similar predictions about behavior. Only during a few initial trials, learning with eligibility traces is qualitatively different from other algorithms. Here, I implemented a novel learning task with an experimental manipulation that allowed us to guide participants through a controlled sequence of states. With this hidden manipulation, we were able to isolate the specific trials in which the competing models are distinguishable. Behavioral data as well as simultaneously recorded pupil dilation revealed effects compatible with eligibility traces, but not with simpler models. Furthermore, the trial-by-trial reward prediction errors were correlated with pupil dilation and EEG measurements. Our experimental data show effects of eligibility traces in behavior and pupil data, after a single experience of state-action associations, which has not been studied before in a multi-step task. We view our results in the light of one-shot learning and as a signature of a learning mechanism present both in temporal difference and one-shot learning.

Official source

https://infoscience.epfl.ch/record/262525?ln=en

About this result

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

One-shot learning and eligibility traces in sequential decision making

Graph Chatbot

Chat with Graph Search

Seeking the new, learning from the unexpected: Computational models of surprise and novelty in the brain

Unveiling the complexity of learning and decision-making

SORI: A softness-rendering interface to unravel the nature of softness perception

SORI: A softness-rendering interface to unravel the nature of softness perception

Unveiling the complexity of learning and decision-making

Seeking the new, learning from the unexpected: Computational models of surprise and novelty in the brain