Not just for pong, this paper simulates environments from 2d Atari games to 3d racing sims. It has DQN Scores on a bunch of games (and goes into high detail on how they got those scores), but they don’t explicitly compare those score with current state-of-the-art.
Highlights From the Paper
- We also introduce a simulator that does not need to predict visual inputs after every action, reducing the computational burden in the use of the model.
- Indeed we found that, the higher the number of consecutive prediction-dependent transitions, the more the model is encouraged to focus on learning the global dynamics of the environment, which results in higher long-term accuracy
Whilst the LSTM memory and our training scheme have proven to capture long-term dependencies, alternative memory structures are required in order, for example, to learn spatial coherence at a more global level than the one displayed by our model in the 3D mazes in oder to do navigation.