Cooperative off-policy prediction of Markov decision processes in adaptive networks

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

We apply diffusion strategies to propose a cooperative reinforcement learning algorithm, in which agents in a network communicate with their neighbors to improve predictions about their environment. The algorithm is suitable to learn off-policy even in large state spaces. We provide a mean-square-error performance analysis under constant step-sizes. The gain of cooperation in the form of more stability and less bias and variance in the prediction error, is illustrated in the context of a classical model. We show that the improvement in performance is especially significant when the behavior policy of the agents is different from the target policy under evaluation.

Cooperative off-policy prediction of Markov decision processes in adaptive networks

Graph Chatbot

Chat with Graph Search

Learning-Augmented Dynamic Power Management with Multiple States via New Ski Rental Bounds

Multiagent Fully Decentralized Value Function Learning With Linear Convergence Rates

Linear regression analysis of regional mean speed of Athens city network using drone data: A multi-modal approach

Learning-Augmented Dynamic Power Management with Multiple States via New Ski Rental Bounds

Linear regression analysis of regional mean speed of Athens city network using drone data: A multi-modal approach

Multiagent Fully Decentralized Value Function Learning With Linear Convergence Rates