当前位置: X-MOL 学术J. Real-Time Image Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lightweight convolutional neural network for real-time 3D object detection in road and railway environments
Journal of Real-Time Image Processing ( IF 3 ) Pub Date : 2022-02-11 , DOI: 10.1007/s11554-022-01202-6
A. Mauri 1 , R. Khemmar 1 , B. Decoux 1 , M. Haddad 2 , R. Boutteau 3
Affiliation  

For smart mobility, and autonomous vehicles (AV), it is necessary to have a very precise perception of the environment to guarantee reliable decision-making, and to be able to extend the results obtained for the road sector to other areas such as rail. To this end, we introduce a new single-stage monocular real-time 3D object detection convolutional neural network (CNN) based on YOLOv5, dedicated to smart mobility applications for both road and rail environments. To perform the 3D parameter regression, we replace YOLOv5’s anchor boxes with our hybrid anchor boxes. Our method is available in different model sizes such as YOLOv5: small, medium, and large. The new model that we propose is optimized for real-time embedded constraints (lightweight, speed, and accuracy) that takes advantage of the improvement brought by split attention (SA) convolutions called small split attention model (Small-SA). To validate our CNN model, we also introduce a new virtual dataset for both road and rail environments by leveraging the video game Grand Theft Auto V (GTAV). We provide extensive results of our different models on both KITTI and our own GTAV datasets. Through our results, we show that our method is the fastest available 3D object detection with accuracy results close to state-of-the-art methods on the KITTI road dataset. We further demonstrate that the pre-training process on our GTAV virtual dataset improves the accuracy on real datasets such as KITTI, thus allowing our method to obtain an even greater accuracy than state-of-the-art approaches with 16.16% 3D average precision on hard car detection with inference time of 11.1 ms/image on an RTX 3080 GPU.



中文翻译:

用于公路和铁路环境中实时 3D 对象检测的轻量级卷积神经网络

对于智能移动和自动驾驶汽车 (AV),必须对环境有非常精确的感知,以保证可靠的决策,并能够将道路部门获得的结果扩展到铁路等其他领域。为此,我们引入了一种基于 YOLOv5 的新型单级单目实时 3D 对象检测卷积神经网络 (CNN),专用于公路和铁路环境的智能移动应用。为了执行 3D 参数回归,我们用我们的混合锚框替换 YOLOv5 的锚框。我们的方法适用于不同的模型尺寸,例如 YOLOv5:小型、中型和大型。我们提出的新模型针对实时嵌入式约束(轻量级、速度、和准确性),它利用了拆分注意力(SA)卷积带来的改进,称为小拆分注意力模型(Small-SA)。为了验证我们的 CNN 模型,我们还通过利用视频游戏 Grand Theft Auto V (GTAV) 为公路和铁路环境引入了一个新的虚拟数据集。我们在 KITTI 和我们自己的 GTA 5 数据集上提供了我们不同模型的广泛结果。通过我们的结果,我们表明我们的方法是最快的可用 3D 对象检测,其准确度结果接近 KITTI 道路数据集上最先进的方法。我们进一步证明,我们的 GTA 5 虚拟数据集的预训练过程提高了真实数据集(如 KITTI)的准确性,从而使我们的方法能够获得比最先进的 16 方法更高的准确性。

更新日期:2022-02-14
down
wechat
bug