NDNet: Spacewise Multiscale Representation Learning via Neighbor Decoupling for Real-Time Driving Scene Parsing.,IEEE Transactions on Neural Networks and Learning Systems

当前位置： X-MOL 学术 › IEEE Trans. Neural Netw. Learn. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

NDNet: Spacewise Multiscale Representation Learning via Neighbor Decoupling for Real-Time Driving Scene Parsing.
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.2 ) Pub Date : 2024-06-03 , DOI: 10.1109/tnnls.2022.3221745
Shu Li ₁ , Qingqing Yan ₁ , Xun Zhou ₁ , Deming Wang ₁ , Chengju Liu ₁ , Qijun Chen ₁

Affiliation

As a safety-critical application, autonomous driving requires high-quality semantic segmentation and real-time performance for deployment. Existing method commonly suffers from information loss and massive computational burden due to high-resolution input-output and multiscale learning scheme, which runs counter to the real-time requirements. In contrast to channelwise information modeling commonly adopted by modern networks, in this article, we propose a novel real-time driving scene parsing framework named NDNet from a novel perspective of spacewise neighbor decoupling (ND) and neighbor coupling (NC). We first define and implement the reversible operations called ND and NC, which realize lossless resolution conversion for complementary thumbnails sampling and collation to facilitate spatial modeling. Based on ND and NC, we further propose three modules, namely, local capturer and global dependence builder (LCGB), spacewise multiscale feature extractor (SMFE), and high-resolution semantic generator (HSG), which form the whole pipeline of NDNet. The LCGB serves as a stem block to preprocess the large-scale input for fast but lossless resolution reduction and extract initial features with global context. Then the SMFE is used for dense feature extraction and can obtain rich multiscale features in spatial dimension with less computational overhead. As for high-resolution semantic output, the HSG is designed for fast resolution reconstruction and adaptive semantic confusion amending. Experiments show the superiority of the proposed method. NDNet achieves the state-of-the-art performance on the Cityscapes dataset which reports 76.47% mIoU at 240+ frames/s and 78.8% mIoU at 150+ frames/s on the benchmark. Codes are available at https://github.com/LiShuTJ/NDNet.

中文翻译：

NDNet：通过邻居解耦进行空间多尺度表示学习，用于实时驾驶场景解析。

作为安全关键的应用，自动驾驶需要高质量的语义分割和实时性能来部署。由于高分辨率的输入输出和多尺度学习方案，现有方法通常会遭受信息丢失和巨大的计算负担，这与实时要求背道而驰。与现代网络普遍采用的通道信息建模相反，在本文中，我们从空间邻域解耦（ND）和邻域耦合（NC）的新角度提出了一种新颖的实时驾驶场景解析框架，名为NDNet。我们首先定义并实现了称为 ND 和 NC 的可逆操作，它们实现了互补缩略图采样和整理的无损分辨率转换，以方便空间建模。在ND和NC的基础上，我们进一步提出了三个模块，即局部捕获器和全局依赖构建器（LCGB）、空间多尺度特征提取器（SMFE）和高分辨率语义生成器（HSG），它们构成了NDNet的整个流程。 LCGB 作为主干块来预处理大规模输入，以实现快速但无损的分辨率降低，并提取具有全局上下文的初始特征。然后利用SMFE进行稠密特征提取，能够以较少的计算开销获得丰富的空间维度多尺度特征。对于高分辨率语义输出，HSG 旨在实现快速分辨率重建和自适应语义混淆修正。实验证明了该方法的优越性。 NDNet 在 Cityscapes 数据集上实现了最先进的性能，在基准测试中，在 240+ 帧/秒时达到 76.47% mIoU，在 150+ 帧/秒时达到 78.8% mIoU。代码可在 https://github.com/LiShuTJ/NDNet 获取。

更新日期：2022-11-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11