arxivst stuff from arxiv that you should probably bookmark

The Reactor: A Sample-Efficient Actor-Critic Architecture

Abstract · Apr 15, 2017 15:38 ·

off reactor retrace step actor a3c critic policy dqn mnih cs-ai

Arxiv Abstract

  • Audrunas Gruslys
  • Mohammad Gheshlaghi Azar
  • Marc G. Bellemare
  • Remi Munos

In this work we present a new reinforcement learning agent, called Reactor (for Retrace-actor), based on an off-policy multi-step return actor-critic architecture. The agent uses a deep recurrent neural network for function approximation. The network outputs a target policy {\pi} (the actor), an action-value Q-function (the critic) evaluating the current policy {\pi}, and an estimated behavioral policy {\hat \mu} which we use for off-policy correction. The agent maintains a memory buffer filled with past experiences. The critic is trained by the multi-step off-policy Retrace algorithm and the actor is trained by a novel {\beta}-leave-one-out policy gradient estimate (which uses both the off-policy corrected return and the estimated Q-function). The Reactor is sample-efficient thanks to the use of memory replay, and numerical efficient since it uses multi-step returns. Also both acting and learning can be parallelized. We evaluated our algorithm on 57 Atari 2600 games and demonstrate that it achieves state-of-the-art performance.

Read the paper (pdf) »