Towards closing the gap in weakly supervised semantic segmentation with DCNNs: Combining local and global models,Computer Vision and Image Understanding

当前位置： X-MOL 学术 › Comput. Vis. Image Underst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards closing the gap in weakly supervised semantic segmentation with DCNNs: Combining local and global models
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2021-05-08 , DOI: 10.1016/j.cviu.2021.103209
Christoph Mayer , Radu Timofte , Grégory Paul

Generating training sets for Deep Convolutional Neural Networks (DCNNs) is a bottleneck for modern real-world applications. This is a demanding task for applications where annotating training data is costly, such as in semantic segmentation. In the literature, there is still a gap between the performance achieved by a network trained on full and on weak annotations. In this paper, we establish a simple and natural strategy to measure this gap and to identify the components necessary to reduce it. On scribbles, we establish new state-of-the-art results: we obtain a mIoU of 75.6% without, and 75.7% with CRF post-processing. We reduce the gap by 64.2% whereas the current state-of-the-art reduces it only by 57.5%. Thanks to a formal reformulation of the weak supervision problem, a systematic study of the different components involved, and an original experimental strategy, we unravel a counter-intuitive mechanism analog to the philosophy of ensemble learning. This strategy is simple and amenable to generalizations to other weakly-supervised scenarios: averaging poor local predicted annotations with a generic naive baseline and reusing them for training a DCNN yields new state-of-the-art results. We show that our strategy accommodates effortlessly other pixel-level weak annotations such as bounding boxes and remains competitive.

中文翻译：

试图弥合DCNNs在弱监督语义分割中的空白：结合局部和全局模型

为深度卷积神经网络（DCNN）生成训练集是现代现实世界应用程序的瓶颈。对于注释训练数据非常昂贵的应用程序（如语义分段）而言，这是一项艰巨的任务。在文献中，通过对完整注释和弱注释进行训练的网络所实现的性能之间仍然存在差距。在本文中，我们建立了一个简单而自然的策略来衡量此差距并确定减少差距的必要组成部分。在涂鸦上，我们建立了最新的结果：在不使用CRF的情况下，mIoU为75.6％，使用CRF后处理的mIoU为75.7％。我们将差距缩小了64.2％，而当前的最新水平仅将差距缩小了57.5％。由于对弱监督问题进行了正式的重新表述，因此对所涉及的不同组成部分进行了系统的研究，和原始的实验策略，我们揭示了一种类似于集合学习哲学的反直觉机制。此策略简单易行，并且易于推广到其他弱监督的场景：使用通用的朴素基线将不良的本地预测注释平均化，并将其重新用于训练DCNN会产生新的最新结果。我们证明了我们的策略可以毫不费力地适应其他像素级别的弱注解（例如边界框），并保持竞争力。

更新日期：2021-05-18

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11