Shape Prior Guided Instance Disparity Estimation for 3D Object Detection,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Shape Prior Guided Instance Disparity Estimation for 3D Object Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2021-04-29 , DOI: 10.1109/tpami.2021.3076678
Linghao Chen , Jiaming Sun , Yiming Xie , Siyu Zhang , Qing Shuai , Qinhong Jiang , Guofeng Zhang , Hujun Bao , Xiaowei Zhou

In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images. Many recent works solve this problem by first recovering point clouds with disparity estimation and then apply a 3D detector. The disparity map is computed for the entire image, which is costly and fails to leverage category-specific prior. In contrast, we design an instance disparity estimation network (iDispNet) that predicts disparity only for pixels on objects of interest and learns a category-specific shape prior for more accurate disparity estimation. To address the challenge from scarcity of disparity annotation in training, we propose to use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds, which makes our system more widely applicable. Experiments on the KITTI dataset show that, when LiDAR ground-truth is not used at training time, Disp R-CNN outperforms previous state-of-the-art methods based on stereo input by 20 percent in terms of average precision for all categories. The code and pseudo-ground-truth data are available at the project page: https://github.com/zju3dv/disprcnn.

中文翻译：

用于 3D 对象检测的形状先验引导实例视差估计

在本文中，我们提出了一种名为 Disp R-CNN 的新颖系统，用于立体图像的 3D 对象检测。最近的许多工作通过首先通过视差估计恢复点云，然后应用 3D 检测器来解决这个问题。视差图是针对整个图像计算的，这是昂贵的并且无法利用特定于类别的先验。相比之下，我们设计了一个实例视差估计网络（iDispNet），仅预测感兴趣对象上的像素视差，并先学习特定于类别的形状以实现更准确的视差估计。为了解决训练中视差标注稀缺的挑战，我们建议使用统计形状模型来生成密集视差伪地面实况，而不需要激光雷达点云，这使得我们的系统具有更广泛的适用性。 KITTI 数据集上的实验表明，当训练时不使用 LiDAR 地面实况时，Disp R-CNN 在所有类别的平均精度方面优于之前基于立体输入的最先进方法 20%。代码和伪真实数据可在项目页面获取：https://github.com/zju3dv/disprcnn。

更新日期：2021-04-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11