Publication

Models of Reward-Modulated Spike-Timing-Dependent Plasticity

Nicolas Frémaux
2013
Thèse EPFL

Résumé

How do animals learn to repeat behaviors that lead to the obtention of food or other “rewarding” objects? As a biologically plausible paradigm for learning in spiking neural networks, spike-timing dependent plasticity (STDP) has been shown to perform well in unsupervised learning tasks such as receptive field development. However, STDP fails to take behavioral relevance into account, and as such is inadequate to explain a vast range of learning tasks in which the final outcome, conditioned on the prior execution of a series of actions, is signaled to an animal through sparse rewards. In this thesis, I show that the addition of a third, global, reward-based factor to the pre- and postsynaptic factors of STDP is a promising solution to this problem, consistent with experimental findings. One one hand, dopamine is a neuromodulator which has been shown to encode reward signals in the brain. On the other hand, STDP has been shown to be affected by dopamine, even though the precise nature of the interaction is unclear. Moreover, the theoretical framework of reinforcement learning provides strong foundation for the analysis of these learning rules. After studying existing examples of such rules in a navigation task, I derive simple functional requirements for reward-modulated learning rules, and illustrate these in a motor learning task. One of those functional requirements is the existence a “critic” structure, constantly evaluating the potential for rewarding events. The implication of the existence of such a critic on the interpretation of psychophysical experiments are also discussed. Finally, I propose a biologically plausible implementation of such a structure, that performs motor or navigational tasks. This is based on a generalization of temporal difference learning, a well-known reinforcement learning framework, to continuous time, well suited to an implementation with spiking neurons. These result provide a unified picture of reward-modulated learning rules: even though different rules have been proposed, these can be reduced to a single model at the synaptic level, with variations in the computation of the neuromodulatory signal enabling switching between different learning rules.

Source officielle

https://infoscience.epfl.ch/record/186412?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search