Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
An agent that deviates from a usual or previous course of action can be said to display novel or varying behaviour. Novelty of behaviour can be seen as the result of real or apparent randomness in decision making, which prevents an agent from repeating exactly past choices. In this paper, novelty of behaviour is considered as an evolutionary precursor of the exploring skill in reward learning, and conservative behaviour as the precursor of exploitation. Novelty of behaviour in neural control is hypothesised to be an important factor in the neuro-evolution of operant reward learning. Agents capable of varying behaviour, as opposed to conservative, when exposed to reward stimuli appear to acquire on a faster evolutionary scale the meaning and use of such reward information. The hypothesis is validated by comparing the performance during evolution in two environments that either favour or are neutral to novelty. Following these findings, we suggest that neuro-evolution of operant reward learning is fostered by environments where behavioural novelty is intrinsically beneficial, i.e. where varying or exploring behaviour is associated with low risk.
Ali H. Sayed, Mert Kayaalp, Virginia Bordignon