The next time we do a Kaggle comptition we’ll try this technique. When working on an image segmentation problem, it’s those last couple pixels around the edges that make all the difference in your mIoU.
This new paper acknowledges that some pixels are it’s harder to classify then others, and comes up with a pretty good solution to the problem. It’s called a deep layer cascade and they very helpfully show you how to to convert an existing model. Unlike similar solutions, runtime is speedy because the higher stages only have to look at a much smaller set of hard to classify pixels.
Highlights From the Paper
- “Achieves a mIoU of 80.3 and further improves the mIoU to 82.7 with pre-training on COCO, which is the best-performing method on VOC12 benchmark.”
- “Capable of running in real-time yet still yielding competitive accuracies.”
We propose a novel deep layer cascade (LC) method to improve the accuracy and speed of semantic segmentation. Unlike the conventional model cascade (MC) that is composed of multiple independent models, LC treats a single deep model as a cascade of several sub-models. Earlier sub-models are trained to handle easy and confident regions, and they progressively feed-forward harder regions to the next sub-model for processing. Convolutions are only calculated on these regions to reduce computations. The proposed method possesses several advantages. First, LC classifies most of the easy regions in the shallow stage and makes deeper stage focuses on a few hard regions. Such an adaptive and 'difficulty-aware' learning improves segmentation performance. Second, LC accelerates both training and testing of deep network thanks to early decisions in the shallow stage. Third, in comparison to MC, LC is an end-to-end trainable framework, allowing joint learning of all sub-models. We evaluate our method on PASCAL VOC and Cityscapes datasets, achieving state-of-the-art performance and fast speed.
Read the paper (pdf) »