Time course of prediction errors in sequential decision making

Michael Herzog, He Xu
2018
Discussion par affiche

Résumé

In reinforcement learning, an agent makes sequential decisions to maximize reward. During learning, the actual and expected outcome are compared to tell whether a decision was good or bad. The difference between the actual outcome and expected outcome is the prediction error. The prediction error can be categorised into state prediction errors (SPE) and reward prediction errors (RPE). which can serve as a teaching signal in reinforcement learning. fMRI studies revealed the brain areas where the reward prediction error and the state prediction error are computed (Haruno & Kawato 2006; McClure et al. 2003; O’Doherty et al. 2003; D’Ardenne et al. 2008; Glascher et al. 2010). Here, by using 128-channel EEG, we show when the SPE and RPE are computed. In our study, participants saw an image on the computer screen and were asked to click one out of three or four buttons, which, depending on the choice, led to the presentation of a new image until a goal image was reached. After participants have learned the path to the goal, we swapped two images. The swapped images created a SPE, which was correlated with a significant change in the frontal N1 component in the Event-Related Potential. To estimate the RPE, we fit participants’ performance to a reinforcement learning algorithm SARSA-Lambda. A time window at 200-400ms in the ERP reflected well the magnitudes of the RPEs of this algorithm (r = 0.51, p = 0.02). Our results show that the frontal P3 component in ERP reflects the reward prediction process, while the state prediction process is reflected by the frontal N1 component, which is in line with the mismatch negativity studies(Campbell et al. 2007).

Source officielle

https://infoscience.epfl.ch/record/265390?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Time course of prediction errors in sequential decision making

Graph Chatbot

Chattez avec Graph Search

Unveiling the complexity of learning and decision-making

Seeking the new, learning from the unexpected: Computational models of surprise and novelty in the brain

Customizing the human-avatar mapping based on EEG error related potentials during avatar-based interaction.

Seeking the new, learning from the unexpected: Computational models of surprise and novelty in the brain

Unveiling the complexity of learning and decision-making

Customizing the human-avatar mapping based on EEG error related potentials during avatar-based interaction.