Want a more efficient attention mechanism while saving on your computational time? Baidu Research did. They just introduced a new model that uses policy optimization and two new actions: continue and stop. This is useful because it allows the network to learn how long to spend attending to a target. Even better, they released a git repo with source code. Before you run off into the wilds, one thing to keep in mind is that it still requires a few steps of training and fine-tuning (strongly supervised). So it’s not end-to-end, though it is differentiable.
Save Time by Teaching Your Model When to Stop
Post · Mar 31, 2017 15:25 · Share on Twitter
We propose a dynamic computational time model to accelerate the average processing time for recurrent visual attention (RAM). Rather than attention with a fixed number of steps for each input image, the model learns to decide when to stop on the fly. To achieve this, we add an additional continue/stop action per time step to RAM and use reinforcement learning to learn both the optimal attention policy and stopping policy. The modification is simple but could dramatically save the average computational time while keeping the same recognition performance as RAM. Experimental results on CUB-200-2011 and Stanford Cars dataset demonstrate the dynamic computational model can work effectively for fine-grained image recognition.The source code of this paper can be obtained from https://github.com/baidu-research/DT-RAM