Incorporating Network Built-in Priors in Weakly-Supervised Semantic Segmentation,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Incorporating Network Built-in Priors in Weakly-Supervised Semantic Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 23.6 ) Pub Date : 2017-06-08 , DOI: 10.1109/tpami.2017.2713785
Fatemeh Sadat Saleh , Mohammad Sadegh Aliakbarian , Mathieu Salzmann , Lars Petersson , Jose M. Alvarez , Stephen Gould

Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objectness priors to generate foreground/background masks. Unfortunately these priors either require pixel-level annotations/bounding boxes, or still yield inaccurate object boundaries. Here, we propose a novel method to extract accurate masks from networks pre-trained for the task of object recognition, thus forgoing external objectness modules. We first show how foreground/background masks can be obtained from the activations of higher-level convolutional layers of a network. We then show how to obtain multi-class masks by the fusion of foreground/background ones with information extracted from a weakly-supervised localization network. Our experiments evidence that exploiting these masks in conjunction with a weakly-supervised training loss yields state-of-the-art tag-based weakly-supervised semantic segmentation results.

中文翻译：

在弱监督语义分割中纳入网络内置先验

像素级注释的获取昂贵且耗时。因此，仅使用图像标签的弱监督可能会对语义分割产生重大影响。最近，基于CNN的方法已提出使用图像标签微调预训练网络。没有附加信息，这将导致定位精度下降。但是，通过使用客观先验来生成前景/背景蒙版，可以缓解此问题。不幸的是，这些先验要么需要像素级注释/边界框，要么仍然产生不准确的对象边界。在这里，我们提出了一种新的方法，该方法从经过预训练的网络中提取精确的遮罩以进行物体识别，从而放弃了外部物体模块。我们首先展示如何从网络的高层卷积层的激活中获得前景/背景蒙版。然后，我们展示了如何通过将前景/背景遮罩与从弱监督的定位网络中提取的信息进行融合来获得多类遮罩。我们的实验证明，将这些掩码与弱监督的训练损失结合使用会产生基于最新标签的弱监督语义分割结果。

更新日期：2018-05-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>