arxivst stuff from arxiv that you should probably bookmark

Finite Sample Analysis for TD(0) with Linear Function Approximation

Abstract · Apr 4, 2017 19:47 ·

cs-ai

Arxiv Abstract

  • Gal Dalal
  • Balázs Szörényi
  • Gugan Thoppe
  • Shie Mannor

TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such a result. Works that managed to obtain concentration bounds for online Temporal Difference (TD) methods analyzed modified versions of them, carefully crafted for the analyses to hold. These modifications include projections and step-sizes dependent on unknown problem parameters. Our analysis obviates these artificial alterations by exploiting strong properties of TD(0) and tailor-made stochastic approximation tools.

Read the paper (pdf) »