当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Probabilistic and Geometric Depth: Detecting Objects in Perspective
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-07-29 , DOI: arxiv-2107.14160
Tai Wang, Xinge Zhu, Jiangmiao Pang, Dahua Lin

3D object detection is an important capability needed in various practical applications such as driver assistance systems. Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results. This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem: The inaccurate instance depth blocks all the other 3D attribute predictions from improving the overall detection performance. However, recent methods directly estimate the depth based on isolated instances or pixels while ignoring the geometric relations across different objects, which can be valuable constraints as the key information about depth is not directly manifest in the monocular image. Therefore, we construct geometric relation graphs across predicted objects and use the graph to facilitate depth estimation. As the preliminary depth estimation of each instance is usually inaccurate in this ill-posed setting, we incorporate a probabilistic representation to capture the uncertainty. It provides an important indicator to identify confident predictions and further guide the depth propagation. Despite the simplicity of the basic idea, our method obtains significant improvements on KITTI and nuScenes benchmarks, achieving the 1st place out of all monocular vision-only methods while still maintaining real-time efficiency. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.

中文翻译:

概率和几何深度:从透视角度检测物体

3D 物体检测是各种实际应用(例如驾驶员辅助系统)中所需的重要能力。与依赖双目视觉或 LiDAR 的传统设置相比,单目 3D 检测作为一种经济的解决方案,最近引起了越来越多的关注,但仍产生不令人满意的结果。本文首先对该问题进行了系统研究,并观察到当前的单目 3D 检测问题可以简化为实例深度估计问题:不准确的实例深度阻止了所有其他 3D 属性预测提高整体检测性能。然而,最近的方法直接基于孤立实例或像素估计深度,而忽略了不同对象之间的几何关系,这可能是有价值的约束,因为关于深度的关键信息没有直接体现在单目图像中。因此,我们构建了跨预测对象的几何关系图,并使用该图来促进深度估计。由于在这种不适定设置中每个实例的初步深度估计通常是不准确的,因此我们结合了概率表示来捕获不确定性。它提供了一个重要的指标来识别可信的预测并进一步指导深度传播。尽管基本思想很简单,但我们的方法在 KITTI 和 nuScenes 基准测试上获得了显着改进,在所有单目视觉方法中排名第一,同时仍保持实时效率。代码和模型将在 https://github.com/open-mmlab/mmdetection3d 上发布。
更新日期:2021-07-30
down
wechat
bug