Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Motivated by alternating game-play in two-player games, we study an altenating variant of the Online Linear Optimization (OLO). In alternating OLO, a learner at each round t ∈[n] selects a vector xt and then an adversary selects a cost-vector ct ∈[−1,1]n. The learner then experiences cost (ct + ct−1)⊤xt instead of (ct)⊤xt as in standard OLO. We establish that under this small twist, the Ω(√T) lower bound on the regret is no longer valid. More precisely, we present two online learning algorithms for alternating OLO that respectively admit O((log n)4/3T1/3) regret for the n-dimensional simplex and O(ρlog T) regret for the ball of radius ρ > 0. Our results imply that in alternating game-play, an agent can always guarantee ̃O((log n)4/3T1/3) regardless the strategies of the other agent while the regret bound improves to O(log T) in case the agent admits only two actions.