arxivst stuff from arxiv that you should probably bookmark

Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem

Abstract · Apr 15, 2017 02:42 ·

pull locatelli rochester thresholding arms rounds bandit mab asynchronous round stat-ml cs-lg

Arxiv Abstract

  • Jie Zhong
  • Yijun Huang
  • Ji Liu

This paper considers the multi-armed thresholding bandit problem – identifying all arms above a predefined threshold via as few pulls (or rounds) as possible – proposed by Locatelli et al. [2016] recently. Although the proposed algorithm in Locatelli et al. [2016] achieves the optimal round complexity a certain sense, there still remain unsolved issues. This paper proposes an asynchronous parallel thresholding algorithm and its parameter-free version to improve the efficiency and the applicability. On one hand, the proposed two algorithms use the empirical variance to guide which arm to pull at each round, and improve the round complexity of the “optimal” algorithm when all arms have bounded high order moments. On the other hand, most bandit algorithms assume that the reward can be observed immediately after the pull or the next decision would not be made before all rewards are observed. Our proposed asynchronous parallel algorithms allow making the choice of the next pull with unobserved rewards from earlier pulls, which avoids such an unrealistic assumption and significantly improves the identification process. Our theoretical analysis justifies the effectiveness and the efficiency of proposed asynchronous parallel algorithms. The empirical study is also provided to validate the proposed algorithms.

Read the paper (pdf) »