**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Multi-Agent Learning for Resource Allocation Problems

Résumé

We analyze resource allocation problems where N independent agents want to access C resources. Each resource can be only accessed by one agent at a time. In order to use the resources efficiently, the agents need to coordinate their access. We focus on decentralized coordination, i.e. coordination without central authority. We analyze coordination mechanisms for two different kind of agents: 1) cooperative, who follow the prescribed protocol entirely; and 2) non-cooperative, who attempt to maximize their own utility. We propose a novel approach to achieve a fair and efficient resource allocation when the agents are cooperative. The agents access resources in slots. At the beginning of each slot, they observe a global coordination signal – a random integer 1 ≤ k ≤ K . The agents then learn a different allocation for each value of the coordination signal. When modeled as a game, the canonical solution to the resource allocation problem is the correlated equilibrium. In a correlated equilibrium, a “smart” coordination device chooses the actions for the “stupid” agents, who then have no incentive to deviate from these recommendations. In contrast, in our solution the coordination signal is “stupid”, since it is not specific to the game. The agents are “smart”, because they learn their strategy independently for each signal value. We show that our learning algorithm converges to an efficient resource allocation in polynomial time. The resulting allocation becomes more fair as the number of signals K increases. A non-cooperative, self-interested agent can exploit our cooperative allocation scheme by accessing one resource all the time, until everyone else gives up. Therefore, for the non-cooperative agents, we consider an infinitely repeated resource allocation game with discounting. This game is symmetric and all the agents are identical, so we look for its symmetric subgame-perfect equilibria. (Bhaskar (2000)) proposed a solution for 2-agent, 1-resource allocation games: Agents start by symmetrically choosing their actions randomly, and as soon as they each choose different actions, they start to follow a convention that prescribes their actions from then on. We extend the concept of convention to the general resource allocation problems of N agents and C resources. We show that for any convention, there exists a symmetric subgame-perfect equilibrium that implements it. We present two conventions: bourgeois, where agents stick to the first allocation; and market, where agents pay for the use of resources, and observe a global coordination signal that allows them to alternate between different allocations. We define the price of anonymity of a convention as the ratio between the maximum social payoff of any (asymmetric) strategy profile and the expected social payoff of the convention. We show that the price of anonymity of the bourgeois convention is infinite. The market convention decreases this price by reducing the conflict between the agents.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Publications associées (5)

Chargement

Chargement

Chargement

Concepts associés (15)

Jeu

thumb|Les Joueurs de cartes de Paul Cézanne (1892-1895, Institut Courtauld).
Le jeu est une activité, humaine ou animale, pratiquée pour se divertir.
Propre aux mammifères, cette activité d'ordre ps

Global Positioning System

Le Global Positioning System (GPS) (en français : « Système mondial de positionnement » [littéralement] ou « Géo-positionnement par satellite »), originellement connu sous le nom de Navstar GPS, est

Stratégie (théorie des jeux)

En théorie des jeux, la stratégie d'un joueur est l’une des options qu’il choisit dans un contexte où le résultat dépend non seulement de ses propres actions, mais également de celles des autres . L

To achieve an optimal outcome in many situations, agents need to choose distinct actions from one another. This is the case notably in many resource allocation problems, where a single resource can only be used by one agent at a time. How shall a designer of a multi-agent system program its identical agents to behave each in a different way? From a game theoretic perspective, such situations lead to undesirable Nash equilibria. For example consider a resource allocation game in that two players compete for an exclusive access to a single resource. It has three Nash equilibria. The two pure-strategy NE are efficient, but not fair. The one mixed-strategy NE is fair, but not efficient. Aumann's notion of correlated equilibrium fixes this problem: It assumes a correlation device that suggests each agent an action to take. However, such a "smart" coordination device might not be available. We propose using a randomly chosen, "stupid" integer coordination signal. "Smart" agents learn which action they should use for each value of the coordination signal. We present a multi-agent learning algorithm that converges in polynomial number of steps to a correlated equilibrium of a channel allocation game, a variant of the resource allocation game. We show that the agents learn to play for each coordination signal value a randomly chosen pure-strategy Nash equilibrium of the game. Therefore, the outcome is an efficient correlated equilibrium. This CE becomes more fair as the number of the available coordination signal values increases.

We analyze symmetric protocols to rationally coordinate on an asymmetric, efficient allocation in an infinitely repeated N-agent, C-resource allocation problems, where the resources are all homogeneous. Bhaskar proposed one way to achieve this in 2-agent, 1-resource games: Agents start by symmetrically randomizing their actions, and as soon as they each choose different actions, they start to follow a potentially asymmetric "convention" that prescribes their actions from then on. We extend the concept of convention to the general case of infinitely repeated resource allocation games with N agents and C resources. We show that for any convention, there exists a symmetric subgame-perfect equilibrium which implements it. We present two conventions: bourgeois, where agents stick to the first allocation; and market, where agents pay for the use of resources, and observe a global coordination signal which allows them to alternate between different allocations. We define price of anonymity of a convention as a ratio between the maximum social payoff of any (asymmetric) strategy profile and the expected social payoff of the subgame-perfect equilibrium which implements the convention. We show that while the price of anonymity of the bourgeois convention is infinite, the market convention decreases this price by reducing the conflict between the agents.

Many games have undesirable Nash equilibria. For exam- ple consider a resource allocation game in which two players compete for an exclusive access to a single resource. It has three Nash equilibria. The two pure-strategy NE are effi- cient, but not fair. The one mixed-strategy NE is fair, but not efficient. Aumann’s notion of correlated equilibrium fixes this problem: It assumes a correlation device which suggests each agent an action to take. However, such a “smart” coordination device might not be available. We propose using a randomly chosen, “stupid” in- teger coordination signal. “Smart” agents learn which action they should use for each value of the coordination signal. We present a multi-agent learning algorithm which con- verges in polynomial number of steps to a correlated equilib- rium of a wireless channel allocation game, a variant of the resource allocation game. We show that the agents learn to play for each coordination signal value a randomly chosen pure-strategy Nash equilibrium of the game. Therefore, the outcome is an efficient correlated equilibrium. This CE be- comes more fair as the number of the available coordination signal values increases. We believe that a similar approach can be used to reach efficient and fair correlated equilibria in a wider set of games, such as potential games.

2011