Small Object Augmentation of Urban Scenes for Real-Time Semantic Segmentation,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Small Object Augmentation of Urban Scenes for Real-Time Semantic Segmentation
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2020-03-18 , DOI: 10.1109/tip.2020.2976856
Zhengeng Yang , Hongshan Yu , Mingtao Feng , Wei Sun , Xuefei Lin , Mingui Sun , Zhi-Hong Mao , Ajmal Mian

Semantic segmentation is a key step in scene understanding for autonomous driving. Although deep learning has significantly improved the segmentation accuracy, current high-quality models such as PSPNet and DeepLabV3 are inefficient given their complex architectures and reliance on multi-scale inputs. Thus, it is difficult to apply them to real-time or practical applications. On the other hand, existing real-time methods cannot yet produce satisfactory results on small objects such as traffic lights, which are imperative to safe autonomous driving. In this paper, we improve the performance of real-time semantic segmentation from two perspectives, methodology and data. Specifically, we propose a real-time segmentation model coined Narrow Deep Network (NDNet) and build a synthetic dataset by inserting additional small objects into the training images. The proposed method achieves 65.7% mean intersection over union (mIoU) on the Cityscapes test set with only 8.4G floating-point operations (FLOPs) on

$1024\times 2048$

inputs. Furthermore, by re-training the existing PSPNet and DeepLabV3 models on our synthetic dataset, we obtained an average 2% mIoU improvement on small objects.

中文翻译：

用于实时语义分割的城市场景小对象增强

语义分割是自动驾驶场景理解中的关键步骤。尽管深度学习已显着提高了分割精度，但由于当前的高质量模型（如PSPNet和DeepLabV3）架构复杂且依赖于多尺度输入，因此效率低下。因此，很难将它们应用于实时或实际应用。另一方面，现有的实时方法尚不能在安全交通自动驾驶所必需的小物体（如交通信号灯）上产生令人满意的结果。在本文中，我们从方法和数据两个角度提高了实时语义分割的性能。特别，我们提出了一种由Neep Deep Network（NDNet）创建的实时分割模型，并通过将其他小对象插入训练图像中来构建综合数据集。所提出的方法在Cityscapes测试集中仅通过8.4G浮点运算（FLOP）即可达到65.7％的平均相交度（mIoU）

$ 1024 \次2048 $

输入。此外，通过在合成数据集上对现有的PSPNet和DeepLabV3模型进行重新训练，我们在小物体上的平均mIoU改善了2％。

更新日期：2020-04-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南