We present a general method for maintaining estimates of the distribution of parameters in arbitrary models. This is then applied to the estimation of probability distribution over actions in value-based reinforcement learning. While this approach is similar to other techniques that maintain a confidence measure for action-values, it nevertheless offers a new insight into current techniques and reveals potential avenues of further research.
Pierre Vandergheynst, Milos Vasic, Francesco Craighero, Renata Khasanova
Pierre Vandergheynst, Milos Vasic, Francesco Craighero, Renata Khasanova