arxivst stuff from arxiv that you should probably bookmark

UBEV - A More Practical Algorithm for Episodic RL with Near-Optimal PAC and Regret Guarantees

Abstract · Mar 22, 2017 15:34 ·

cs-lg cs-ai stat-ml

Arxiv Abstract

  • Christoph Dann
  • Tor Lattimore
  • Emma Brunskill

We present UBEV, a simple and efficient reinforcement learning algorithm for fixed-horizon episodic Markov decision processes. The main contribution is a proof that UBEV enjoys a sample-complexity bound that holds for all accuracy levels simultaneously with high probability, and matches the lower bound except for logarithmic terms and one factor of the horizon. A consequence of the fact that our sample-complexity bound holds for all accuracy levels is that the new algorithm achieves a sub-linear regret of O(sqrt(SAT)), which is the first time the dependence on the size of the state space has provably appeared inside the square root. A brief empirical evaluation shows that UBEV is practically superior to existing algorithms with known sample-complexity guarantees.

Read the paper (pdf) »