当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-02-19 , DOI: 10.1007/s11263-021-01433-3
Peng Sun , Jiaxiang Wu , Songyuan Li , Peiwen Lin , Junzhou Huang , Xi Li

To satisfy the stringent requirements for computational resources in the field of real-time semantic segmentation, most approaches focus on the hand-crafted design of light-weight segmentation networks. To enjoy the ability of model auto-design, Neural Architecture Search (NAS) has been introduced to search for the optimal building blocks of networks automatically. However, the network depth, downsampling strategy, and feature aggregation method are still set in advance and nonadjustable during searching. Moreover, these key properties are highly correlated and essential for a remarkable real-time segmentation model. In this paper, we propose a joint search framework, called AutoRTNet, to automate all the aforementioned key properties in semantic segmentation. Specifically, we propose hyper-cells to jointly decide the network depth and the downsampling strategy via a novel cell-level pruning process. Furthermore, we propose an aggregation cell to achieve automatic multi-scale feature aggregation. Extensive experimental results on Cityscapes and CamVid datasets demonstrate that the proposed AutoRTNet achieves the new state-of-the-art trade-off between accuracy and speed. Notably, our AutoRTNet achieves 73.9% mIoU on Cityscapes and 110.0 FPS on an NVIDIA TitanXP GPU card with input images at a resolution of \(768 \times 1536\).



中文翻译:

通过自动深度,下采样联合决策和特征聚合进行实时语义分割

为了满足实时语义分割领域对计算资源的严格要求,大多数方法都集中在轻量级分割网络的手工设计上。为了享受模型自动设计的能力,引入了神经体系结构搜索(NAS)来自动搜索网络的最佳构建块。但是,网络深度,下采样策略和特征聚合方法仍然是预先设置的,并且在搜索过程中不可调整。此外,这些关键属性是高度相关的,对于出色的实时细分模型必不可少。在本文中,我们提出了一个联合搜索框架,称为AutoRTNet,以使语义分段中的所有上述关键属性自动化。具体来说,我们提出超级小区,通过新颖的小区级修剪过程共同决定网络深度和下采样策略。此外,我们提出了一种聚合单元以实现自动多尺度特征聚合。在Cityscapes和CamVid数据集上的大量实验结果表明,所提出的AutoRTNet实现了准确性和速度之间的最新平衡。值得注意的是,我们的AutoRTNet在Cityscapes上实现了73.9%的mIoU,在NVIDIA TitanXP GPU卡上实现了110.0 FPS的输入图像分辨率为 在Cityscapes和CamVid数据集上的大量实验结果表明,所提出的AutoRTNet实现了精度和速度之间的最新平衡。值得注意的是,我们的AutoRTNet在Cityscapes上实现了73.9%的mIoU,在NVIDIA TitanXP GPU卡上实现了110.0 FPS的输入图像分辨率为 在Cityscapes和CamVid数据集上的大量实验结果表明,所提出的AutoRTNet实现了精度和速度之间的最新平衡。值得注意的是,我们的AutoRTNet在Cityscapes上实现了73.9%的mIoU,在NVIDIA TitanXP GPU卡上实现了110.0 FPS的输入图像分辨率为\(768 \ times 1536 \)

更新日期:2021-02-19
down
wechat
bug