CrossFusion net: Deep 3D object detection based on RGB images and point clouds in autonomous driving,Image and Vision Computing

当前位置： X-MOL 学术 › Image Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CrossFusion net: Deep 3D object detection based on RGB images and point clouds in autonomous driving
Image and Vision Computing ( IF 4.7 ) Pub Date : 2020-06-03 , DOI: 10.1016/j.imavis.2020.103955
Dza-Shiang Hong , Hung-Hao Chen , Pei-Yung Hsiao , Li-Chen Fu , Siang-Min Siao

In recent years, accurate 3D detection plays an important role in a lot of applications. Autonomous driving, for instance, is one of typical representatives. This paper aims to design an accurate 3D detector that takes both Li-DAR point clouds and RGB images as inputs according to the fact that both LiDAR and camera have their own merits. A deep novel end-to-end two-stream learnable architecture, CrossFusion Net, is designed to exploit features from both LiDAR point clouds as well as RGB images through a hierarchical fusion structure. Specifically, CrossFusion Net utilizes bird's eye view (BEV) of point clouds through projection. Besides, these two feature maps of different streams are fused through the newly introduced CrossFusion(CF) layer. The proposed CF layer transforms feature maps of one stream to another based on the spatial relationship between the BEV and RGB images. Additionally, we apply attention mechanism on the transformed feature map and the original one to automatically decide the importance of the two feature maps from the two sensors. Experiments on the challenging KITTI car 3D detection benchmark and BEV detection benchmark show that the presented approach outperforms the other state-of-the-art methods in average precision(AP), specifically, as well as outperforms UberATG-ContFuse [3] of 8% AP in moderate 3D car detection. Furthermore, the proposed network learns an effective representation in perception of circumstances via RGB feature maps and BEV feature maps.

中文翻译：

CrossFusion网络：自动驾驶中基于RGB图像和点云的深度3D对象检测

近年来，准确的3D检测在许多应用中都扮演着重要的角色。例如，自动驾驶就是典型代表之一。本文旨在设计一种精确的3D检测器，根据LiDAR和相机都有各自的优点，以Li-DAR点云和RGB图像作为输入。深度端到端的新型两流可学习架构CrossFusion Net旨在通过分层融合结构来利用LiDAR点云以及RGB图像中的特征。具体来说，CrossFusion Net通过投影利用点云的鸟瞰（BEV）。此外，通过新引入的CrossFusion（CF）层融合了不同流的这两个特征图。所提出的CF层基于BEV和RGB图像之间的空间关系，将一个流的特征图转换为另一个。此外，我们在变换后的特征图和原始特征图上应用了注意力机制，以自动确定来自两个传感器的两个特征图的重要性。在具有挑战性的KITTI汽车3D检测基准和BEV检测基准上进行的实验表明，所提出的方法在平均精度（AP）方面优于其他最新方法，并且优于UberATG-ContFuse [3]（共8个）中度3D汽车检测中的％AP。此外，提出的网络通过RGB特征图和BEV特征图学习了对环境感知的有效表示。我们在变换后的特征图和原始特征图上应用注意力机制，以自动从两个传感器确定两个特征图的重要性。在具有挑战性的KITTI汽车3D检测基准和BEV检测基准上进行的实验表明，所提出的方法在平均精度（AP）方面优于其他最新方法，并且优于UberATG-ContFuse [3]（共8个）中度3D汽车检测中的％AP。此外，提出的网络通过RGB特征图和BEV特征图学习了对环境感知的有效表示。我们在变换后的特征图和原始特征图上应用注意力机制，以自动从两个传感器确定两个特征图的重要性。在具有挑战性的KITTI汽车3D检测基准和BEV检测基准上进行的实验表明，所提出的方法在平均精度（AP）方面优于其他最新方法，并且优于UberATG-ContFuse [3]（共8个）中度3D汽车检测中的％AP。此外，提出的网络通过RGB特征图和BEV特征图学习了对环境感知的有效表示。在具有挑战性的KITTI汽车3D检测基准和BEV检测基准上进行的实验表明，该方法在平均精度（AP）方面优于其他最新方法，并且优于UberATG-ContFuse [3]（共8个）中度3D汽车检测中的％AP。此外，提出的网络通过RGB特征图和BEV特征图学习了对环境感知的有效表示。在具有挑战性的KITTI汽车3D检测基准和BEV检测基准上进行的实验表明，所提出的方法在平均精度（AP）方面优于其他最新方法，并且优于UberATG-ContFuse [3]（共8个）在中等3D汽车检测中的％AP。此外，提出的网络通过RGB特征图和BEV特征图学习了对环境感知的有效表示。

更新日期：2020-06-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>