Cours
This course provides a mathematical treatment of online decision-making. It covers bandits (multi-armed, contextual, structured), Markov Decision Processes (MDPs), and related topics. Key concepts include exploration-exploitation, UCB, Thompson sampling, and tools to derive regret bounds. ...