Multi-scale Attention U-Net (MsAUNet): A Modified U-Net Architecture for Scene Segmentation,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-scale Attention U-Net (MsAUNet): A Modified U-Net Architecture for Scene Segmentation
arXiv - CS - Artificial Intelligence Pub Date : 2020-09-15 , DOI: arxiv-2009.06911
Soham Chattopadhyay, Hritam Basak

Despite the growing success of Convolution neural networks (CNN) in the recent past in the task of scene segmentation, the standard models lack some of the important features that might result in sub-optimal segmentation outputs. The widely used encoder-decoder architecture extracts and uses several redundant and low-level features at different steps and different scales. Also, these networks fail to map the long-range dependencies of local features, which results in discriminative feature maps corresponding to each semantic class in the resulting segmented image. In this paper, we propose a novel multi-scale attention network for scene segmentation purposes by using the rich contextual information from an image. Different from the original UNet architecture we have used attention gates which take the features from the encoder and the output of the pyramid pool as input and produced out-put is further concatenated with the up-sampled output of the previous pyramid-pool layer and mapped to the next subsequent layer. This network can map local features with their global counterparts with improved accuracy and emphasize on discriminative image regions by focusing on relevant local features only. We also propose a compound loss function by optimizing the IoU loss and fusing Dice Loss and Weighted Cross-entropy loss with it to achieve an optimal solution at a faster convergence rate. We have evaluated our model on two standard datasets named PascalVOC2012 and ADE20k and was able to achieve mean IoU of 79.88% and 44.88% on the two datasets respectively, and compared our result with the widely known models to prove the superiority of our model over them.

中文翻译：

多尺度注意力U-Net（MsAUNet）：用于场景分割的改良U-Net体系结构

尽管卷积神经网络（CNN）在最近的场景分割任务中取得了越来越大的成功，但是标准模型仍缺少一些重要的功能，这些功能可能导致次优的分割输出。广泛使用的编码器-解码器体系结构以不同的步骤和规模来提取和使用多个冗余和低级功能。而且，这些网络无法映射局部特征的远程依赖关系，这导致与所生成的分割图像中的每个语义类相对应的区分性特征图。在本文中，我们通过使用来自图像的丰富上下文信息，提出了一种新颖的多尺度注意力网络，用于场景分割。与原始的UNet架构不同，我们使用了注意门，该注意门将来自编码器的特征和金字塔池的输出作为输入，并将生成的输出与上一个金字塔池层的上采样输出进一步连接并进行映射到下一个后续层。该网络可以以更高的精度映射局部特征及其全局对应物，并通过仅关注相关的局部特征来强调区分图像区域。我们还通过优化IoU损耗并将Dice损耗和加权交叉熵损耗与其融合来提出复合损耗函数，从而以更快的收敛速度实现最佳解决方案。我们在名为PascalVOC2012和ADE20k的两个标准数据集上评估了我们的模型，并且在两个数据集上分别能够分别实现79.88％和44.88％的平均IoU，

更新日期：2020-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>