Instrument tracking is an essential requirement for various computer-assisted interventions. To overcome problems such as specular reflection and motion blur, we propose a novel method that takes advantage of the interdependency between localization and segmentation of the tool. In particular, we reformulate the 2D pose estimation as a heatmap regression and thereby enable a robust, concurrent regression of both tasks. Throughout experimental results, we demonstrate that this modeling leads to a significantly higher accuracy than directly regressing the tool’s coordinates. The performance is compared to state-of-the-art on a Retinal Microsurgery benchmark and the EndoVis Challenge.