当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exploiting spatio-temporal representation for 3D human action recognition from depth map sequences
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2021-05-26 , DOI: 10.1016/j.knosys.2021.107040
Xiaopeng Ji , Qingsong Zhao , Jun Cheng , Chenfei Ma

Human action recognition based on 3D data is attracting increasing attention because it could provide more abundant spatial and temporal information compared with RGB videos. The challenge of the depth map based method is to capture the cues between spatial appearances and temporal motions. In this paper, we propose a straightforward and efficient framework for modeling the human action based on depth map sequences, considering the short-term and long-term dependencies. A frame-level feature, termed depth-oriented gradient vector (DOGV), is developed to capture the appearance and motion in a short-term duration. For a long-term dependence, we construct convolutional neural networks (CNNs) based backbone to aggregate frame-level features in the space and time sequence. The proposed method is comprehensively evaluated on four public benchmark datasets, including NTU RGB+D, NTU RGB+D 120, PKU-MMD and UOW LSC. The experimental results demonstrate that the proposed approach can solve the problem of 3D human action recognition in an efficient way and achieve the state-of-the-art performance.



中文翻译:

利用时空表示从深度图序列识别 3D 人体动作

基于 3D 数据的人体动作识别越来越受到关注,因为与 RGB 视频相比,它可以提供更丰富的空间和时间信息。基于深度图的方法的挑战是捕捉空间外观和时间运动之间的线索。在本文中,我们提出了一个简单而有效的框架,用于基于深度图序列对人类行为进行建模,同时考虑到短期和长期依赖关系。开发了一种称为深度导向梯度向量 (DOGV) 的帧级特征,用于在短期内捕捉外观和运动。对于长期依赖,我们构建了基于卷积神经网络 (CNN) 的主干来聚合空间和时间序列中的帧级特征。所提出的方法在四个公共基准数据集上进行了综合评估,包括 NTU RGB+D、NTU RGB+D 120、PKU-MMD 和 UOW LSC。实验结果表明,所提出的方法可以有效地解决 3D 人体动作识别问题并达到最先进的性能。

更新日期:2021-06-05
down
wechat
bug