**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Publication# Simple learning rules to cope with changing environments

Abstract

We consider an agent that must choose repeatedly among several actions. Each action has a certain probability of giving the agent an energy reward, and costs may be associated with switching between actions. The agent does not know which action has the highest reward probability, and the probabilities change randomly over time. We study two learning rules that have been widely used to model decision-making processes in animals—one deterministic and one stochastic. In particular, we examine the influence of the rules’ “learning rate” on the agent’s energy gain. We compare the performance of each rule with the best performance attainable when the agent has either full knowledge or no knowledge of the environment. Over relatively short periods of time both rules are successful in enabling agents to exploit their environment. Moreover, under a range of effective learning rates, both rules are equivalent, and can be expressed by a third rule that requires the agent to select the action for which the current run of unsuccessful trials is shortest. However, the performance of both rules is relatively poor over longer periods of time, and under most circumstances no better than the performance an agent could achieve without knowledge of the environment. We propose a simple extension to the original rules that enables agents to learn about and effectively exploit a changing environment for an unlimited period of time.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (33)

Related MOOCs (15)

Related concepts (32)

Neuronal Dynamics - Computational Neuroscience of Single Neurons

The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.

Neuronal Dynamics - Computational Neuroscience of Single Neurons

The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.

Instructional Design with Orchestration Graphs

Discover a visual language for designing pedagogical scenarios that integrate individual, team and class wide activities.

Machine learning

Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches.

Probability

Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, 0 indicates impossibility of the event and 1 indicates certainty. The higher the probability of an event, the more likely it is that the event will occur. A simple example is the tossing of a fair (unbiased) coin.

Q-learning

Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.

Since the birth of Information Theory, researchers have defined and exploited various information measures, as well as endowed them with operational meanings. Some were born as a "solution to a problem", like Shannon's Entropy and Mutual Information. Other ...

Volkan Cevher, Ali Kavis, Shaul Nadav Hallak

This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm’s convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with prob ...

2020Background Coercion in psychiatry is a controversial issue. Identifying its predictors and their interaction using traditional statistical methods is difficult, given the large number of variables involved. The purpose of this study was to use machine-lear ...