Concept

Self-play

Self-play is a technique for improving the performance of reinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves". In multi-agent reinforcement learning experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage: It provides a straightforward way to determine the actions of the other agents, resulting in a meaningful challenge. It increases the amount of experience that can be used to improve the policy, by a factor of two or more, since the viewpoints of each of the different agents can be used for learning. argues that most of the games that people play for fun are "Games of Skill", meaning games whose space of all possible strategies looks like a spinning top. In more detail, we can partition the space of strategies into sets , such that any , the strategy beats the strategy . Then, in population-based self-play, if the population is larger than , then the algorithm would converge to the best possible strategy. Self-play is used by the AlphaZero program to improve its performance in the games of chess, shogi and go. Self-play is also used to train the Cicero AI system to outperform humans at the game of Diplomacy. The technique is also used in training the DeepNash system to play the game Stratego. Self-Play (SP): Train agents against itself. Yields an open-ended curriculum whereby opponent's and agent's strengths match. Susceptible to cycles in strategy space: Agent forgets how to play against its prior versions. Fictitious Self-Play (FSP): Training an agent against a uniform distribution of all previous policies. Wasting a large number of interactions against weaker opponents.

Source officielle

https://en.wikipedia.org/wiki/Self-play

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Self-play

Graph Chatbot

Chattez avec Graph Search

Fusing Pre-existing Knowledge and Machine Learning for Enhanced Building Thermal Modeling and Control

Inverse design of metal-organic frameworks for direct air capture of CO2via deep reinforcement learning

Lessons Learned from Data-Driven Building Control Experiments: Contrasting Gaussian Process-based MPC, Bilevel DeePC, and Deep Reinforcement Learning

Fusing Pre-existing Knowledge and Machine Learning for Enhanced Building Thermal Modeling and Control

Lessons Learned from Data-Driven Building Control Experiments: Contrasting Gaussian Process-based MPC, Bilevel DeePC, and Deep Reinforcement Learning

Inverse design of metal-organic frameworks for direct air capture of CO2via deep reinforcement learning