Two-stream spatial-temporal neural networks for pose-based action recognition,Journal of Electronic Imaging

当前位置： X-MOL 学术 › J. Electron. Imaging › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Two-stream spatial-temporal neural networks for pose-based action recognition
Journal of Electronic Imaging ( IF 1.1 ) Pub Date : 2020-08-26 , DOI: 10.1117/1.jei.29.4.043025
Zixuan Wang ₁ , Aichun Zhu ₁ , Fangqiang Hu ₁ , Qianyu Wu ₁ , Yifeng Li ₁

Affiliation

Abstract. With recent advances in human pose estimation and human skeleton capture systems, pose-based action recognition has drawn lots of attention among researchers. Although most existing action recognition methods are based on convolutional neural network and long short-term memory, which present outstanding performance, one of the shortcomings of these methods is that they lack the ability to explicitly exploit the rich spatial-temporal information between the skeletons in the behavior, so they are not conducive to improving the accuracy of action recognition. To better address this issue, the two-stream spatial-temporal neural networks for pose-based action recognition is introduced. First, the pose features that are extracted from the raw video are processed by an action modeling module. Then, the temporal information and the spatial information, in the form of relative speed and relative distance, are fed into the temporal neural network and the spatial neural network, respectively. Afterward, the outputs of two-stream networks are fused for better action recognition. Finally, we perform comprehensive experiments on the SUB-JHMDB, SYSU, MPII-Cooking, and NTU RGB+D datasets, the results of which demonstrate the effectiveness of the proposed model.

中文翻译：

用于基于姿势的动作识别的双流时空神经网络

摘要。随着人体姿势估计和人体骨骼捕捉系统的最新进展，基于姿势的动作识别引起了研究人员的广泛关注。尽管现有的大多数动作识别方法都是基于卷积神经网络和长短期记忆，表现出优异的性能，但这些方法的缺点之一是它们缺乏明确利用骨架之间丰富的时空信息的能力。行为，因此它们不利于提高动作识别的准确性。为了更好地解决这个问题，引入了用于基于姿势的动作识别的双流时空神经网络。首先，从原始视频中提取的姿势特征由动作建模模块处理。然后，时间信息和空间信息以相对速度和相对距离的形式分别输入时间神经网络和空间神经网络。之后，融合两个流网络的输出以更好地识别动作。最后，我们在 SUB-JHMDB、SYSU、MPII-Cooking 和 NTU RGB+D 数据集上进行了综合实验，结果证明了所提出模型的有效性。

更新日期：2020-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>