当前位置: X-MOL 学术arXiv.cs.CV › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-07-30 , DOI: arxiv-2107.14391
Jiajun Deng, Wengang Zhou, Yanyong Zhang, Houqiang Li

As an emerging data modal with precise distance sensing, LiDAR point clouds have been placed great expectations on 3D scene understanding. However, point clouds are always sparsely distributed in the 3D space, and with unstructured storage, which makes it difficult to represent them for effective 3D object detection. To this end, in this work, we regard point clouds as hollow-3D data and propose a new architecture, namely Hallucinated Hollow-3D R-CNN ($\text{H}^2$3D R-CNN), to address the problem of 3D object detection. In our approach, we first extract the multi-view features by sequentially projecting the point clouds into the perspective view and the bird-eye view. Then, we hallucinate the 3D representation by a novel bilaterally guided multi-view fusion block. Finally, the 3D objects are detected via a box refinement module with a novel Hierarchical Voxel RoI Pooling operation. The proposed $\text{H}^2$3D R-CNN provides a new angle to take full advantage of complementary information in the perspective view and the bird-eye view with an efficient framework. We evaluate our approach on the public KITTI Dataset and Waymo Open Dataset. Extensive experiments demonstrate the superiority of our method over the state-of-the-art algorithms with respect to both effectiveness and efficiency. The code will be made available at \url{https://github.com/djiajunustc/H-23D_R-CNN}.

中文翻译:

从多视图到 Hollow-3D:用于 3D 对象检测的幻觉 Hollow-3D R-CNN

作为一种新兴的具有精确距离感知的数据模态,激光雷达点云对 3D 场景理解寄予厚望。然而,点云在 3D 空间中总是稀疏分布,并且具有非结构化存储,这使得它们难以表示以进行有效的 3D 对象检测。为此,在这项工作中,我们将点云视为空心 3D 数据,并提出了一种新架构,即 Hallucinated Hollow-3D R-CNN ($\text{H}^2$3D R-CNN),以解决3D物体检测问题。在我们的方法中,我们首先通过将点云依次投影到透视图和鸟瞰图来提取多视图特征。然后,我们通过一种新颖的双边引导多视图融合块产生幻觉 3D 表示。最后,3D 对象通过具有新颖的分层体素 RoI 池化操作的框细化模块进行检测。提出的 $\text{H}^2$3D R-CNN 提供了一个新的角度,可以通过有效的框架充分利用透视图和鸟瞰图中的互补信息。我们在公共 KITTI 数据集和 Waymo 开放数据集上评估我们的方法。大量实验证明了我们的方法在有效性和效率方面优于最先进的算法。该代码将在 \url{https://github.com/djiajunustc/H-23D_R-CNN} 上提供。我们在公共 KITTI 数据集和 Waymo 开放数据集上评估我们的方法。大量实验证明了我们的方法在有效性和效率方面优于最先进的算法。该代码将在 \url{https://github.com/djiajunustc/H-23D_R-CNN} 上提供。我们在公共 KITTI 数据集和 Waymo 开放数据集上评估我们的方法。大量实验证明了我们的方法在有效性和效率方面优于最先进的算法。该代码将在 \url{https://github.com/djiajunustc/H-23D_R-CNN} 上提供。
更新日期:2021-08-02
down
wechat
bug