当前位置: X-MOL 学术IEEE Trans. Neural Netw. Learn. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DPSNet: Multitask Learning Using Geometry Reasoning for Scene Depth and Semantics
IEEE Transactions on Neural Networks and Learning Systems ( IF 10.4 ) Pub Date : 2021-09-16 , DOI: 10.1109/tnnls.2021.3107362
Junning Zhang , Qunxing Su , Bo Tang , Cheng Wang , Yining Li

Multitask joint learning technology continues gaining more attention as a paradigm shift and has shown promising performance in many applications. Depth estimation and semantic understanding from monocular images emerge as a challenging problem in computer vision. While the other joint learning frameworks establish the relationship between the semantics and depth from stereo pairs, the lack of learning camera motion renders the frameworks that fail to model the geometric structure of the image scene. We make a further step in this article by proposing a multitask learning method, namely DPSNet, which can jointly perform depth and camera pose estimation and semantic scene segmentation. Our core idea for depth and camera pose prediction is that we present the rigid semantic consistency loss to overcome the limitation of moving pixels from image reconstruction technology and further infer the segmentation of moving instances based on them. In addition, our proposed model performs semantic segmentation by reasoning the geometric correspondences between the pixel semantic outputs and the semantic labels at multiscale resolutions. Experiments on open-source datasets and a video dataset captured on a micro-smart car show the effectiveness of each component of DPSNet, and DPSNet achieves state-of-the-art results in all three tasks compared with the best popular methods. All our models and code are available at https://github.com/jn-z/DPSNet : Multitask Learning Using Geometry Reasoning for Scene Depth and semantics.

中文翻译:

DPSNet:使用几何推理进行场景深度和语义的多任务学习

多任务联合学习技术作为一种范式转变继续受到更多关注,并在许多应用中显示出有前途的性能。单目图像的深度估计和语义理解成为计算机视觉中的一个具有挑战性的问题。虽然其他联合学习框架建立了立体对的语义和深度之间的关系,但缺乏学习相机运动使得框架无法对图像场景的几何结构进行建模。我们在本文中更进一步,提出了一种多任务学习方法,即 DPSNet,它可以联合执行深度和相机姿态估计以及语义场景分割。我们对深度和相机姿态预测的核心思想是提出刚性语义一致性损失,以克服图像重建技术对移动像素的限制,并进一步推断基于它们的移动实例的分割。此外,我们提出的模型通过推理多尺度分辨率下像素语义输出和语义标签之间的几何对应关系来执行语义分割。在开源数据集和微型智能汽车上捕获的视频数据集上的实验显示了 DPSNet 每个组件的有效性,与最流行的方法相比,DPSNet 在所有三个任务中都取得了最先进的结果。我们所有的模型和代码都可以在 我们提出的模型通过推理多尺度分辨率下像素语义输出和语义标签之间的几何对应关系来执行语义分割。在开源数据集和微型智能汽车上捕获的视频数据集上的实验显示了 DPSNet 每个组件的有效性,与最流行的方法相比,DPSNet 在所有三个任务中都取得了最先进的结果。我们所有的模型和代码都可以在 我们提出的模型通过推理多尺度分辨率下像素语义输出和语义标签之间的几何对应关系来执行语义分割。在开源数据集和微型智能汽车上捕获的视频数据集上的实验显示了 DPSNet 每个组件的有效性,与最流行的方法相比,DPSNet 在所有三个任务中都取得了最先进的结果。我们所有的模型和代码都可以在https://github.com/jn-z/DPSNet :使用几何推理进行场景深度和语义的多任务学习。
更新日期:2021-09-16
down
wechat
bug