当前位置: X-MOL 学术IET Intell. Transp. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Three-stage RGBD architecture for vehicle and pedestrian detection using convolutional neural networks and stereo vision
IET Intelligent Transport Systems ( IF 2.3 ) Pub Date : 2020-09-17 , DOI: 10.1049/iet-its.2019.0367
Pedro Augusto Pinho Ferraz 1 , Bernardo Augusto Godinho Oliveira 1 , Flávia Magalhães Freitas Ferreira 1 , Carlos Augusto Paiva da Silva Martins 1
Affiliation  

With the growth of autonomous vehicles and collision-avoidance systems, several approaches using deep learning and convolutional neural networks (CNNs) continually address accuracy improvement in obstacle detection. The authors introduce a three-stage architecture that adds side channels as low-level features to serve as input to existing CNNs. In a case study, the architecture is used to extract depth from stereo cameras, and then compose RGBD inputs to state-of-the-art CNNs to improve their vehicle and pedestrian detection accuracy. This can be achieved by simple modifications on the first layers of any existing CNN with RGB inputs. To validate the architecture, the state-of-the-art matching cost-CNN, and cascade residual learning, both specialist algorithms to extract depth information combined to the state-of-the-art Faster-region-based CNN, MSCNCN, and Subcategory-aware Convolutional Neural Network (SubCNN) to yield the models to be tested using the KITTI dataset benchmark. In many cases, the accuracy (in terms of average precision) using their proposal outperforms the original scores in various scenarios of detection difficulty, reaching improvements up to +3.96% in the training and +1.50% in the testing KITTI datasets. This proposal also introduces efficient methods to initialise the weights of the depth convolutional filters during transfer learning using net surgery.

中文翻译:

利用卷积神经网络和立体视觉进行车辆和行人检测的三阶段RGBD架构

随着自动驾驶汽车和避免碰撞系统的发展,使用深度学习和卷积神经网络(CNN)的几种方法不断解决障碍物检测中的准确性提高问题。作者介绍了一种三阶段体系结构,该体系结构将侧通道添加为低级功能,以用作现有CNN的输入。在一个案例研究中,该体系结构用于从立体摄像机中提取深度,然后将RGBD输入组合到最新的CNN中,以提​​高其车辆和行人检测的准确性。这可以通过对具有RGB输入的任何现有CNN的第一层进行简单修改来实现。为了验证架构,最新的匹配成本CNN和级联残差学习,这两种专家算法都可以提取深度信息,并结合到基于最新的Faster-region的CNN中,MSCNCN和可识别子类别的卷积神经网络(SubCNN),以使用KITTI数据集基准测试生成要测试的模型。在许多情况下,使用他们的建议的准确性(就平均准确性而言)在各种检测困难情况下均优于原始分数,在训练中达到了+ 3.96%的改进,在测试KITTI数据集中达到了1.50%的改进。该建议还介绍了有效的方法,可在使用网络手术进行的转移学习期间初始化深度卷积滤波器的权重。在培训中达到+ 3.96%的改进,在测试KITTI数据集中达到+ 1.50%的改进。该建议还介绍了有效的方法,可在使用网络手术进行的转移学习期间初始化深度卷积滤波器的权重。在培训中达到+ 3.96%的改进,在测试KITTI数据集中达到+ 1.50%的改进。该建议还介绍了有效的方法,可在使用网络手术进行的转移学习期间初始化深度卷积滤波器的权重。
更新日期:2020-09-18
down
wechat
bug