当前位置: X-MOL 学术Neural Process Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semantic Image Segmentation with Improved Position Attention and Feature Fusion
Neural Processing Letters ( IF 3.1 ) Pub Date : 2020-05-12 , DOI: 10.1007/s11063-020-10240-9
Hegui Zhu , Yan Miao , Xiangde Zhang

Encoder–decoder structure is an universal method for semantic image segmentation. However, some important information of images will lost with the increasing depth of convolutional neural network (CNN), and the correlation between arbitrary pixels will get worse. This paper designs a novel image segmentation model to obtain dense feature maps and promote segmentation effects. In encoder stage, we employ ResNet-50 to extract features, and then add a spatial pooling pyramid (SPP) to achieve multi-scale feature fusion. In decoder stage, we provide an improved position attention module to integrate contextual information effectively and remove the trivial information through changing the construction way of attention matrix. Furthermore, we also propose the feature fusion structure to generate dense feature maps by preforming element–wise sum operation on the upsampling features and corresponding encoder features. The simulation results illustrate that the average accuracy and mIOU on CamVid dataset can reach 90.7% and 63.1% respectively. It verifies the effectiveness and reliability of the proposed method.

中文翻译:

具有改进的位置注意和特征融合的语义图像分割

编码器-解码器结构是语义图像分割的通用方法。但是,随着卷积神经网络(CNN)深度的增加,一些重要的图像信息将会丢失,并且任意像素之间的相关性会变差。本文设计了一种新颖的图像分割模型,以获取密集的特征图并提高分割效果。在编码器阶段,我们使用ResNet-50提取特征,然后添加空间池金字塔(SPP)以实现多尺度特征融合。在解码器阶段,我们提供了一种改进的位置关注模块,可以有效地整合上下文信息,并通过改变关注矩阵的构造方式来去除琐碎的信息。此外,我们还提出了特征融合结构,通过对上采样特征和相应的编码器特征执行逐个元素的求和运算来生成密集的特征图。仿真结果表明,CamVid数据集的平均准确度和mIOU分别达到90.7%和63.1%。验证了所提方法的有效性和可靠性。
更新日期:2020-05-12
down
wechat
bug