Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection From Point Clouds.,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection From Point Clouds.
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 2023-06-30 , DOI: 10.1109/tpami.2021.3125981
Junbo Yin ₁ , Jianbing Shen ₂ , Xin Gao ₃ , David J. Crandall ₄ , Ruigang Yang ₅

Affiliation

Previous works for LiDAR-based 3D object detection mainly focus on the single-frame paradigm. In this paper, we propose to detect 3D objects by exploiting temporal information in multiple frames, i.e., point cloud videos. We empirically categorize the temporal information into short-term and long-term patterns. To encode the short-term data, we present a Grid Message Passing Network (GMPNet), which considers each grid (i.e., the grouped points) as a node and constructs a k-NN graph with the neighbor grids. To update features for a grid, GMPNet iteratively collects information from its neighbors, thus mining the motion cues in grids from nearby frames. To further aggregate long-term frames, we propose an Attentive Spatiotemporal Transformer GRU (AST-GRU), which contains a Spatial Transformer Attention (STA) module and a Temporal Transformer Attention (TTA) module. STA and TTA enhance the vanilla GRU to focus on small objects and better align moving objects. Our overall framework supports both online and offline video object detection in point clouds. We implement our algorithm based on prevalent anchor-based and anchor-free detectors. Evaluation results on the challenging nuScenes benchmark show superior performance of our method, achieving first on the leaderboard (at the time of paper submission) without any "bells and whistles." Our source code is available at https://github.com/shenjianbing/GMP3D.

中文翻译：

用于从点云检测 3D 视频对象的图神经网络和时空变换器注意力。

之前基于 LiDAR 的 3D 物体检测工作主要集中在单帧范式上。在本文中，我们建议通过利用多帧（即点云视频）中的时间信息来检测 3D 对象。我们根据经验将时间信息分为短期和长期模式。为了对短期数据进行编码，我们提出了一个网格消息传递网络（GMPNet），它将每个网格（即分组点）视为一个节点，并与相邻网格构建一个 k-NN 图。为了更新网格的特征，GMPNet 迭代地从其邻居收集信息，从而从附近的帧中挖掘网格中的运动线索。为了进一步聚合长期帧，我们提出了一种注意力时空变换器 GRU (AST-GRU)，其中包含空间变换器注意（STA）模块和时间变换器注意（TTA）模块。STA 和 TTA 增强了 vanilla GRU，使其专注于小物体并更好地对齐移动物体。我们的整体框架支持点云中的在线和离线视频对象检测。我们基于流行的基于锚和无锚检测器来实现我们的算法。对具有挑战性的 nuScenes 基准的评估结果显示了我们的方法的卓越性能，在排行榜上排名第一（在提交论文时），没有任何“花里胡哨”。我们的源代码可在 https://github.com/shenjianbing/GMP3D 获取。我们的整体框架支持点云中的在线和离线视频对象检测。我们基于流行的基于锚和无锚检测器来实现我们的算法。对具有挑战性的 nuScenes 基准的评估结果显示了我们的方法的卓越性能，在排行榜上排名第一（在提交论文时），没有任何“花里胡哨”。我们的源代码可在 https://github.com/shenjianbing/GMP3D 获取。我们的整体框架支持点云中的在线和离线视频对象检测。我们基于流行的基于锚和无锚检测器来实现我们的算法。对具有挑战性的 nuScenes 基准的评估结果显示了我们的方法的卓越性能，在排行榜上排名第一（在提交论文时），没有任何“花里胡哨”。我们的源代码可在 https://github.com/shenjianbing/GMP3D 获取。

更新日期：2021-11-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11