AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network,Applied Soft Computing

当前位置： X-MOL 学术 › Appl. Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network
Applied Soft Computing ( IF 7.2 ) Pub Date : 2020-09-02 , DOI: 10.1016/j.asoc.2020.106682
Quan Zhou , Yu Wang , Yawen Fan , Xiaofu Wu , Suofei Zhang , Bin Kang , Longin Jan Latecki

The extensive computational burden limits the usage of convolutional neural networks (CNNs) in edge devices for image semantic segmentation, which plays a significant role in many real-world applications, such as augmented reality, robotics, and self-driving. To address this problem, this paper presents an attention-guided lightweight network, namely AGLNet, which employs an encoder–decoder architecture for real-time semantic segmentation. Specifically, the encoder adopts a novel residual module to abstract feature representations, where two new operations, channel split and shuffle, are utilized to greatly reduce computation cost while maintaining higher segmentation accuracy. On the other hand, instead of using complicated dilated convolution and artificially designed architecture, two types of attention mechanism are subsequently employed in the decoder to upsample features to match input resolution. Specifically, a factorized attention pyramid module (FAPM) is used to explore hierarchical spatial attention from high-level output, still remaining fewer model parameters. To delineate object shapes and boundaries, a global attention upsample module (GAUM) is adopted as global guidance for high-level features. The comprehensive experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy on three self-driving datasets: CityScapes, CamVid, and Mapillary Vistas. AGLNet achieves 71.3%, 69.4%, and 30.7% mean IoU on these datasets with only 1.12M model parameters. Our method also achieves 52 FPS, 90 FPS, and 53 FPS inference speed, respectively, using a single GTX 1080Ti GPU. Our code is open-source and available at https://github.com/xiaoyufenfei/Efficient-Segmentation-Networks.

中文翻译：

AGLNet：通过注意力导向的轻量级网络实现自动驾驶图像的实时语义分割

大量的计算负担限制了在边缘设备中使用卷积神经网络（CNN）进行图像语义分割，这在许多现实世界的应用程序（例如增强现实，机器人技术和自动驾驶）中发挥着重要作用。为了解决这个问题，本文提出了一种引人注目的轻量级网络，即AGLNet，它采用编码器-解码器体系结构进行实时语义分段。具体地，编码器采用新颖的残差模块来抽象特征表示，其中利用两个新的操作，即信道分割和混洗，来在保持较高的分割精度的同时大大降低了计算成本。另一方面，代替使用复杂的膨胀卷积和人为设计的体系结构，随后在解码器中采用两种类型的注意力机制来对特征进行上采样以匹配输入分辨率。具体来说，使用分解注意力金字塔模块（FAPM）来从高级输出中探索分层的空间注意力，而仍然保留较少的模型参数。为了描绘物体的形状和边界，全球关注的上采样模块（GAUM）被用作高级功能的全球指南。全面的实验表明，我们的方法在三个自动驾驶数据集：CityScapes，CamVid和Mapillary Vistas的速度和准确性方面均达到了最先进的结果。在只有1.12M模型参数的情况下，AGLNet在这些数据集上的平均IoU达到71.3％，69.4％和30.7％。使用单个GTX 1080Ti GPU，我们的方法还分别达到52 FPS，90 FPS和53 FPS推理速度。我们的代码是开源的，可从https://github.com/xiaoyufenfei/Efficient-Segmentation-Networks获得。

更新日期：2020-09-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11