MAFF-Net: Filter False Positive for 3D Vehicle Detection with Multi-modal Adaptive Feature Fusion,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MAFF-Net: Filter False Positive for 3D Vehicle Detection with Multi-modal Adaptive Feature Fusion
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2020-09-23 , DOI: arxiv-2009.10945
Zehan Zhang, Ming Zhang, Zhidong Liang, Xian Zhao, Ming Yang, Wenming Tan, and ShiLiang Pu

3D vehicle detection based on multi-modal fusion is an important task of many applications such as autonomous driving. Although significant progress has been made, we still observe two aspects that need to be further improvement: First, the specific gain that camera images can bring to 3D detection is seldom explored by previous works. Second, many fusion algorithms run slowly, which is essential for applications with high real-time requirements(autonomous driving). To this end, we propose an end-to-end trainable single-stage multi-modal feature adaptive network in this paper, which uses image information to effectively reduce false positive of 3D detection and has a fast detection speed. A multi-modal adaptive feature fusion module based on channel attention mechanism is proposed to enable the network to adaptively use the feature of each modal. Based on the above mechanism, two fusion technologies are proposed to adapt to different usage scenarios: PointAttentionFusion is suitable for filtering simple false positive and faster; DenseAttentionFusion is suitable for filtering more difficult false positive and has better overall performance. Experimental results on the KITTI dataset demonstrate significant improvement in filtering false positive over the approach using only point cloud data. Furthermore, the proposed method can provide competitive results and has the fastest speed compared to the published state-of-the-art multi-modal methods in the KITTI benchmark.

中文翻译：

MAFF-Net：使用多模态自适应特征融合过滤 3D 车辆检测的误报

基于多模态融合的3D车辆检测是自动驾驶等诸多应用的重要任务。虽然已经取得了重大进展，但我们仍然观察到两个方面需要进一步改进：第一，相机图像可以为 3D 检测带来的特定增益是以前的工作很少探索的。其次，很多融合算法运行缓慢，这对于实时性要求高的应用（自动驾驶）是必不可少的。为此，我们在本文中提出了一种端到端可训练的单级多模态特征自适应网络，该网络利用图像信息有效减少 3D 检测的误报，并且具有较快的检测速度。提出了一种基于通道注意力机制的多模态自适应特征融合模块，使网络能够自适应地使用每个模态的特征。基于上述机制，提出了两种融合技术，以适应不同的使用场景：PointAttentionFusion适用于过滤简单误报，速度更快；DenseAttentionFusion 适用于过滤难度较大的误报，综合性能较好。在 KITTI 数据集上的实验结果表明，与仅使用点云数据的方法相比，在过滤误报方面有显着改进。此外，与 KITTI 基准中已发布的最先进的多模态方法相比，所提出的方法可以提供有竞争力的结果并具有最快的速度。DenseAttentionFusion 适用于过滤难度较大的误报，综合性能较好。在 KITTI 数据集上的实验结果表明，与仅使用点云数据的方法相比，在过滤误报方面有显着改进。此外，与 KITTI 基准中已发布的最先进的多模态方法相比，所提出的方法可以提供有竞争力的结果并具有最快的速度。DenseAttentionFusion 适用于过滤难度较大的误报，综合性能较好。在 KITTI 数据集上的实验结果表明，与仅使用点云数据的方法相比，在过滤误报方面有显着改进。此外，与 KITTI 基准中已发布的最先进的多模态方法相比，所提出的方法可以提供有竞争力的结果并具有最快的速度。

更新日期：2020-09-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文