当前位置: X-MOL 学术ACM Trans. Multimed. Comput. Commun. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Part-wise Spatio-temporal Attention Driven CNN-based 3D Human Action Recognition
ACM Transactions on Multimedia Computing, Communications, and Applications ( IF 5.1 ) Pub Date : 2021-07-22 , DOI: 10.1145/3441628
Chhavi Dhiman 1 , Dinesh Kumar Vishwakarma 2 , Paras Agarwal 3
Affiliation  

Recently, human activity recognition using skeleton data is increasing due to its ease of acquisition and finer shape details. Still, it suffers from a wide range of intra-class variation, inter-class similarity among the actions and view variation due to which extraction of discriminative spatial and temporal features is still a challenging problem. In this regard, we present a novel Residual Inception Attention Driven CNN (RIAC-Net) Network, which visualizes the dynamics of the action in a part-wise manner. The complete skeletonis partitioned into five key parts: Head to Spine, Left Leg, Right Leg, Left Hand, Right Hand. For each part, a Compact Action Skeleton Sequence (CASS) is defined. Part-wise skeleton-based motion dynamics highlights discriminative local features of the skeleton that helps to overcome the challenges of inter-class similarity and intra-class variation with improved recognition performance. The RIAC-Net architecture is inspired by the concept of inception-residual representation that unifies the Attention Driven Residues (ADR) with inception-based Spatio-Temporal Convolution Features (STCF) to learn efficient salient action features. An ablation study is also carried out to analyze the effect of ADR over simple residue-based action representation. The robustness of the proposed framework is evaluated by performing an extensive experiment on four challenging datasets: UT Kinect Action 3D, Florence 3D action, MSR Daily Action3D, and NTU RGB-D datasets, which consistently demonstrate the superiority of the proposed method over other state-of-the-art methods.

中文翻译:

部分时空注意力驱动的基于 CNN 的 3D 人体动作识别

最近,由于易于获取和更精细的形状细节,使用骨骼数据的人类活动识别正在增加。尽管如此,它仍然受到广泛的类内变化、动作之间的类间相似性和视图变化的影响,因此提取有区别的空间和时间特征仍然是一个具有挑战性的问题。在这方面,我们提出了一种新颖的 Residual Inception Attention Driven CNN (RIAC-Net) Network,它以部分方式可视化动作的动态。完整的骨架分为五个关键部分:头到脊柱、左腿、右腿、左手、右手。对于每个部分,都定义了一个紧凑动作骨架序列 (CASS)。基于部分骨架的运动动力学突出了骨架的判别性局部特征,有助于克服类间相似性和类内变化的挑战,提高识别性能。RIAC-Net 架构的灵感来自初始残差表示概念,该概念将注意力驱动残差 (ADR) 与基于初始的时空卷积特征 (STCF) 相结合,以学习有效的显着动作特征。还进行了消融研究以分析 ADR 对基于残基的简单动作表示的影响。通过对四个具有挑战性的数据集进行广泛的实验来评估所提出框架的稳健性:UT Kinect Action 3D、Florence 3D action、MSR Daily Action3D 和 NTU RGB-D 数据集,
更新日期:2021-07-22
down
wechat
bug