Minimizing Regret in Multi-armed Bandit Models by Iterative Saddle-point Solving ---- Wouter Koolen

  • When Nov 20, 2019 from 02:00 PM to 03:00 PM (Europe/Amsterdam / UTC100)
  • Where L102
  • Add event to calendar iCal

This is an extended version of my recent Scientific Meeting presentation.

UCB is king in stochastic bandits; it is computationally efficient, performs very well in practice and matches the regret rate lower bound. Can we achieve these same features under structural assumptions on the arm constellation (Lipschitzness, unimodality, sparsity, ...)? We present a new strategy for doing so.

Our starting point is the generic Graves & Lai instance-dependent regret lower bound, which takes the form of a semi-infinite linear program. We first recast it as a two-player zero-sum game. We then discuss iterative methods for equilibrium computation based on full-information online learning. We then obtain efficient strategies for the structured bandit problem from the resulting iterates. Moreover, we prove finite-time, asymptotically optimal regret guarantees for any structured bandit problem by leveraging the regret guarantees for the employed online learners.