Delves into Reinforcement Learning with Human Feedback, discussing convergence of estimators and introducing a pessimistic approach for improved performance.
Covers MuZero, a model that learns to predict rewards and actions iteratively, achieving state-of-the-art performance in board games and Atari video games.