Semantic segmentation using stride spatial pyramid pooling and dual attention decoder,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semantic segmentation using stride spatial pyramid pooling and dual attention decoder
Pattern Recognition ( IF 8 ) Pub Date : 2020-11-01 , DOI: 10.1016/j.patcog.2020.107498
Chengli Peng , Jiayi Ma

Abstract Semantic segmentation is an end-to-end task that requires both semantic and spatial accuracy. It is important for deep learning-based segmentation methods to effectively utilize the high-level feature map whose semantic information is abundant and the low-level feature map whose spatial information is accurate. However, existing segmentation networks typically cannot take full advantage of these two kinds of feature maps, leading to inferior performance. This paper attempts to overcome this challenge by introducing two novel structures. On the one hand, we propose a structure called stride spatial pyramid pooling (SSPP) to capture multiscale semantic information from the high-level feature map. Compared with existing pyramid pooling methods based on the atrous convolution, the SSPP structure is able to gather more information from the high-level feature map with faster inference speed, which improves the utilization efficiency of the high-level feature map significantly. On the other hand, we propose a dual attention decoder consisting of a channel attention branch and a spatial attention branch to make full use of the high- and low-level feature maps simultaneously. The dual attention decoder can result in a more “semantic” low-level feature map and a high-level feature map with more accurate spatial information, which bridges the gap between these two kinds of feature maps and benefits their fusion. We evaluate the proposed model on several publicly available semantic image segmentation benchmarks including PASCAL VOC 2012, Cityscapes and COCO-Stuff. The qualitative and quantitative results demonstrate that our method can achieve the state-of-the-art performance.

中文翻译：

使用步幅空间金字塔池化和双重注意解码器的语义分割

摘要语义分割是一项端到端的任务，需要语义和空间的准确性。对于基于深度学习的分割方法来说，有效利用语义信息丰富的高层特征图和空间信息准确的低层特征图非常重要。然而，现有的分割网络通常不能充分利用这两种特征图，导致性能较差。本文试图通过引入两种新颖的结构来克服这一挑战。一方面，我们提出了一种称为步幅空间金字塔池化 (SSPP) 的结构，以从高级特征图中捕获多尺度语义信息。与现有的基于多孔卷积的金字塔池化方法相比，SSPP结构能够以更快的推理速度从高层特征图中收集更多信息，显着提高了高层特征图的利用效率。另一方面，我们提出了一个由通道注意力分支和空间注意力分支组成的双重注意力解码器，以同时充分利用高低级特征图。双注意力解码器可以产生更“语义”的低级特征图和具有更准确空间信息的高级特征图，这弥合了这两种特征图之间的差距并有利于它们的融合。我们在几个公开可用的语义图像分割基准上评估所提出的模型，包括 PASCAL VOC 2012、Cityscapes 和 COCO-Stuff。

更新日期：2020-11-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>