当前位置: X-MOL 学术Image Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Efficient pyramid context encoding and feature embedding for semantic segmentation
Image and Vision Computing ( IF 4.2 ) Pub Date : 2021-05-08 , DOI: 10.1016/j.imavis.2021.104195
Mengyu Liu , Hujun Yin

For reality applications of semantic segmentation, inference speed and memory usage are two important factors. To address these challenges, we propose a lightweight feature pyramid encoding network (FPENet) for semantic segmentation with a good trade-off between accuracy and speed. We use a series of feature pyramid encoding (FPE) blocks to encode context at multiple scales in the encoder. Each FPE block consists of different depthwise dilated convolutions that perform as a spatial pyramid to extract features and reduce computational costs. During training, a one-shot neural architecture search algorithm is adopted to find the optimal structure for each FPE block from a large search space with a small search cost. After the search for the encoder, a mutual embedding upsample module is introduced in the decoder, consisting of two attention blocks. The encoder-decoder attention mechanism is used to help aggregate efficiently high-level semantic features and low-level spatial details. The proposed network outperforms the existing real-time methods with fewer parameters and improved inference speed on the Cityscapes and CamVid benchmark datasets. Specifically, it achieved 72.3% mean IoU on the Cityscapes test set with only 0.4 M parameters and 192.6 FPS speed on an Nvidia Titan V100 GPU, and 73.4% mean IoU with 116.2 FPS when running on higher resolution images.



中文翻译:

高效的金字塔上下文编码和特征嵌入,用于语义分割

对于语义分段的实际应用,推理速度和内存使用是两个重要因素。为了解决这些挑战,我们提出了一种用于语义分割的轻量级特征金字塔编码网络(FPENet),它在准确性和速度之间取得了很好的折衷。我们使用一系列特征金字塔编码(FPE)块在编码器中以多个比例对上下文进行编码。每个FPE块均由不同的深度扩展卷积组成,这些卷积充当空间金字塔以提取特征并降低计算成本。在训练过程中,采用单发式神经体系结构搜索算法,以较大的搜索空间以较小的搜索成本为每个FPE块找到最佳结构。搜索编码器后,在解码器中引入了一个相互嵌入的上采样模块,该模块由两个注意模块组成。编码器-解码器注意机制用于帮助有效地聚合高级语义特征和低级空间细节。拟议的网络以更少的参数和对Cityscapes和CamVid基准数据集的推理速度提高了现有的实时方法。具体来说,在Nvidia Titan V100 GPU上,仅使用0.4 M参数和192.6 FPS速度的Cityscapes测试集,其平均IoU达到72.3%,而在高分辨率图像上运行时,其平均IoU达到116.2 FPS。

更新日期:2021-05-17
down
wechat
bug