Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
Motivated by alternating game-play in two-player games, we study an altenating variant of the Online Linear Optimization (OLO). In alternating OLO, a learner at each round t ∈[n] selects a vector xt and then an adversary selects a cost-vector ct ∈[−1,1]n. The learner then experiences cost (ct + ct−1)⊤xt instead of (ct)⊤xt as in standard OLO. We establish that under this small twist, the Ω(√T) lower bound on the regret is no longer valid. More precisely, we present two online learning algorithms for alternating OLO that respectively admit O((log n)4/3T1/3) regret for the n-dimensional simplex and O(ρlog T) regret for the ball of radius ρ > 0. Our results imply that in alternating game-play, an agent can always guarantee ̃O((log n)4/3T1/3) regardless the strategies of the other agent while the regret bound improves to O(log T) in case the agent admits only two actions.