arxivst stuff from arxiv that you should probably bookmark

Lean From Thy Neighbor: Stochastic & Adversarial Bandits in a Network

Abstract · Apr 14, 2017 16:24 ·

neighbors peers own regret her farmers band setting social seed cs-lg cs-si

Arxiv Abstract

  • L. Elisa Celis
  • Farnood Salehi

An individual’s decisions are often guided by those of his or her peers, i.e., neighbors in a social network. Presumably, being privy to the experiences of others aids in learning and decision making, but how much advantage does an individual gain by observing her neighbors? Such problems make appearances in sociology and economics and, in this paper, we present a novel model to capture such decision-making processes and appeal to the classical multi-armed bandit framework to analyze it. Each individual, in addition to her own actions, can observe the actions and rewards obtained by her neighbors, and can use all of this information in order to minimize her own regret. We provide algorithms for this setting, both for stochastic and adversarial bandits, and show that their regret smoothly interpolates between the regret in the classical bandit setting and that of the full-information setting as a function of the neighbors’ exploration. In the stochastic setting the additional information must simply be incorporated into the usual estimation of the rewards, while in the adversarial setting this is attained by constructing a new unbiased estimator for the rewards and appropriately bounding the amount of additional information provided by the neighbors. These algorithms are optimal up to log factors; despite the fact that the agents act independently and selfishly, this implies that it is an approximate Nash equilibria for all agents to use our algorithms. Further, we show via empirical simulations that our algorithms, often significantly, outperform existing algorithms that one could apply to this setting.

Read the paper (pdf) »