Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.
Whether we prepare a coffee or navigate to a shop: in many tasks we make multiple decisions before reaching a goal. Learning such state-action sequences from sparse reward raises the problem of credit-assignment: which actions out of a long sequence should be reinforced? One solution provided by reinforcement learning (RL) theory is the eligibility trace (ET); a decaying memory of the state-action history. Here we investigate behaviorally and neurally whether humans utilize an ET when learning a multi-step decision making task. We implemented three versions of a novel task using visual, acoustic, and spatial cues. Eleven subjects performed all three conditions while we recorded their pupil diameter. We considered model-based and model-free (with and without ET) algorithms to explain human learning. Using the Akaike Information Criterion (AIC) we find that model-free learning with ET explains the human behavior best in all three conditions. Cross-validation confirms this behavioral result. We then compare pupil dilation in early and late learning and observe differences that are consistent with an ET contribution. In particular, we find significant changes in pupil response to non-goal states after just a single reward in all three experimental conditions. In this research we introduce a novel paradigm to study the ET in human learning in a multi-step sequential decision making task. The analysis of the behavioral and pupil data provides evidence that humans utilize an eligibility trace to solve the credit-assignment problem when learning from sparse and delayed reward.
Chargement
Chargement
Chargement
Chargement
Chargement
Martina Fantin, Jocelin Grosse, Maria del Carmen Sandi Perez, Michael van der Kooij