Learning deep cross-scale feature propagation for indoor semantic segmentation,ISPRS Journal of Photogrammetry and Remote Sensing

当前位置： X-MOL 学术 › ISPRS J. Photogramm. Remote Sens. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning deep cross-scale feature propagation for indoor semantic segmentation
ISPRS Journal of Photogrammetry and Remote Sensing ( IF 10.6 ) Pub Date : 2021-04-21 , DOI: 10.1016/j.isprsjprs.2021.03.023
Linxi Huan , Xianwei Zheng , Shengjun Tang , Jianya Gong

Indoor semantic segmentation is a long-standing vision task that has been recently advanced by convolutional neural networks (CNNs), but this task remains challenging by high occlusion and large scale variation of indoor scenes. Existing CNN-based methods mainly focus on using auxiliary depth data to enrich features extracted from RGB images, hence, they pay less attention to exploiting multi-scale information in exracted features, which is essential for distinguishing objects in highly cluttered indoor scenes. This paper proposes a deep cross-scale feature propagation network (CSNet), to effectively learn and fuse multi-scale features for robust semantic segmentation of indoor scene images. The proposed CSNet is deployed as an encoder-decoder engine. During encoding, the CSNet propagates contextual information across scales and learn discriminative multi-scale features, which are robust to large object scale variation and indoor occlusion. The decoder of CSNet then adaptively integrates the multi-scale encoded features with fusion supervision at all scales to generate target semantic segmentation prediction. Extensive experiments conducted on two challenging benchmarks demonstrate that the CSNet can effectively learn multi-scale representations for robust indoor semantic segmentation, achieving outstanding performance with mIoU scores of $51.5$ and $50.8$ on NYUDv2 and SUN-RGBD datasets, respectively.

中文翻译：

学习深度跨尺度特征传播以进行室内语义分割

室内语义分割是一项长期存在的视觉任务，最近已由卷积神经网络（CNN）进行了改进，但是由于室内场景的高度遮挡和大规模变化，该任务仍然具有挑战性。现有的基于CNN的方法主要集中在使用辅助深度数据来丰富从RGB图像中提取的特征，因此，它们很少关注在提取的特征中利用多尺度信息，这对于区分高度混乱的室内场景中的对象至关重要。本文提出了一种深度跨尺度特征传播网络（CSNet），以有效地学习和融合多尺度特征，从而对室内场景图像进行鲁棒的语义分割。拟议的CSNet被部署为编码器-解码器引擎。在编码过程中，CSNet可以跨尺度传播上下文信息，并学习具有判别力的多尺度特征，这些特征对于大型物体尺度变化和室内遮挡具有鲁棒性。然后，CSNet的解码器将多尺度编码特征与所有级别的融合监督自适应地集成在一起，以生成目标语义分段预测。在两个具有挑战性的基准上进行的广泛实验表明，CSNet可以有效地学习多尺度表示，以实现可靠的室内语义分割，并以mIoU分数达到出色的性能。 $51.5$ 和 $50.8$ 分别在NYUDv2和SUN-RGBD数据集上。

更新日期：2021-04-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11