Emilie Kaufmann | Revisiting the Exploration/Exploitation trade-off in bandit models

  • When Nov 21, 2016 from 03:00 PM to 04:00 PM (Europe/Amsterdam / UTC100)
  • Where L016
  • Add event to calendar iCal

In multi-armed bandit models, there is a fundamental difference between optimization (a.k.a. best arm identification or pure-exploration) and regret minimization. As we shall see, in both cases a notion of (asymptotically) optimal algorithm can be defined, and the behavior of optimal algorithms for the two objectives are indeed quite different. Moreover, the complexity of the two problems feature different information-theoretic quantities. Building on this observation, we will investigate the best possible performance of an Explore-Then-Commit strategy for regret minimization, that starts with a pure-exploration phase to identify the best arm in the model and then play this estimated best arm till the end of the budget. In two-armed bandit models with Gaussian arms, we will see that the regret of such strategies is at least twice larger as that of the best strategies interleaving exploration and exploitation.

This talk is based on joint works with Aurélien Garivier and Tor Lattimore.