当前位置: X-MOL 学术IET Intell. Transp. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Pedestrian motion recognition via Conv-VLAD integrated spatial-temporal-relational network
IET Intelligent Transport Systems ( IF 2.3 ) Pub Date : 2020-04-30 , DOI: 10.1049/iet-its.2019.0471
Shiyu Peng 1, 2 , Tingli Su 1, 2 , Xuebo Jin 1, 2 , Jianlei Kong 1, 2 , Yuting Bai 1, 2
Affiliation  

Pedestrian motion recognition is one of the important components of an intelligent transportation system. Since commonly used spatial-temporal features are still not sufficient for mining deep information in frames, this study proposes a three-stream neural network called a spatial-temporal-relational network (STRN), where the static spatial information, dynamic motion and differences between adjunct keyframes are comprehensively considered as features of the video records. In addition, an optimised pooling layer called convolutional vector of locally aggregated descriptors layer (Conv-VLAD) is employed before the final classification step in each stream to better aggregate the extracted features and reduce the inter-class differences. To accomplish this, the original video records are required to be processed into RGB images, optical flow images and RGB difference images to deliver the respective information for each stream. After the classification result is obtained from each stream, a decision-level fusion mechanism is introduced to improve the network's overall accuracy via combining the partial understandings together. Experimental results on two public data sets UCF101 (94.7%) and HMDB51 (69.0%), show that the proposed method achieves significantly improved performance. The results of STRN have far-reaching significance for the application of deep learning in intelligent transportation systems to ensure pedestrian safety.

中文翻译:

通过Conv-VLAD集成时空关系网络的行人运动识别

行人运动识别是智能交通系统的重要组成部分之一。由于常用的时空特征仍不足以挖掘帧中的深层信息,因此本研究提出了一种三流神经网络,称为时空关系网络(STRN),其中静态空间信息,动态运动以及两者之间的差异辅助关键帧被广泛视为视频记录的功能。另外,在每个流的最后分类步骤之前,采用了称为局部聚合描述符层卷积向量的优化池化层(Conv-VLAD),以更好地聚合提取的特征并减少类间差异。为此,需要将原始视频记录处理为RGB图像,光流图像和RGB差异图像,以传递每个流的相应信息。从每个流中获得分类结果后,引入决策级融合机制,以通过将部分理解结合在一起来提高网络的整体准确性。在两个公开数据集UCF101(94.7%)和HMDB51(69.0%)上的实验结果表明,该方法实现了显着的性能提升。STRN的结果对于将深度学习应用于智能交通系统以确保行人安全具有深远的意义。通过将部分理解结合在一起来提高整体准确性。在两个公开数据集UCF101(94.7%)和HMDB51(69.0%)上的实验结果表明,该方法实现了显着的性能提升。STRN的结果对于将深度学习应用于智能交通系统以确保行人安全具有深远的意义。通过将部分理解结合在一起来提高整体准确性。在两个公开数据集UCF101(94.7%)和HMDB51(69.0%)上的实验结果表明,该方法取得了显着改善的性能。STRN的结果对于将深度学习应用于智能交通系统以确保行人安全具有深远的意义。
更新日期:2020-04-30
down
wechat
bug