Marc Volkert | Confidence Bound Algorithms in Game Trees

Marc's will speak about his MSc research project on Game Tree Search in preparation for his Leiden MSc thesis defence.
  • When Oct 05, 2017 from 03:00 PM to 05:00 PM (Europe/Amsterdam / UTC200)
  • Where L016
  • Add event to calendar iCal

Since Google's AlphaGo defeated the grandmaster of the ancient Chinese game Go in the end of 2015, game tree search algorithms have received increased attention. In this talk we will focus on such algorithms where the goal is to find a best move in a two-player zero-sum game. Important concepts related to game tree search are going to be introduced, such as the formal representation of game trees, minimax search, pruning trees, and Monte Carlo tree search, which is a method to greatly reduce the complexity when searching a tree by applying stochastic elements to the problem.

Confidence bound algorithms are a state-of-the-art approach of doing Monte Carlo tree search and reliably find a best move. The FindTopWinner algorithm (Teraoka, Hatano, and Takimoto, 2014) will be discussed, which combines confidence bounds with an epoch-wise pruning strategy. Its shortcomings will be outlined and improvements will be investigated, with the goal of deriving a new algorithm that fares better in terms of total sample complexity, that is, the amount of times the game has to be simulated to the end in order to make a reliable decision on a move. Most notably, Bernstein’s inequality for a tighter confidence bound, and the Law of the Iterated Logarithm for optimal stopping and pruning (Balsubramani, 2014) were considered. The steps towards new algorithms will be explained and the final algorithms will be validated and compared by simulation studies.

Kazuki Teraoka, Kohei Hatano, and Eiji Takimoto. Efficient sampling methods for Monte Carlo tree search problem. IEICE TRANSACTIONS on Information and Systems, 97(3):392-398, 2014.

Akshay Balsubramani. Sharp finite-time Iterated-Logarithm martingale concentration. arXiv preprint arXiv:1405.2639, 2014.