当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Skeleton-based action recognition via spatial and temporal transformer networks
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2021-05-08 , DOI: 10.1016/j.cviu.2021.103219
Chiara Plizzari , Marco Cannici , Matteo Matteucci

Skeleton-based Human Activity Recognition has achieved great interest in recent years as skeleton data has demonstrated being robust to illumination changes, body scales, dynamic camera views, and complex background. In particular, Spatial–Temporal Graph Convolutional Networks (ST-GCN) demonstrated to be effective in learning both spatial and temporal dependencies on non-Euclidean data such as skeleton graphs. Nevertheless, an effective encoding of the latent information underlying the 3D skeleton is still an open problem, especially when it comes to extracting effective information from joint motion patterns and their correlations. In this work, we propose a novel Spatial–Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator. In our ST-TR model, a Spatial Self-Attention module (SSA) is used to understand intra-frame interactions between different body parts, and a Temporal Self-Attention module (TSA) to model inter-frame correlations. The two are combined in a two-stream network, whose performance is evaluated on three large-scale datasets, NTU-RGB+D 60, NTU-RGB+D 120, and Kinetics Skeleton 400, consistently improving backbone results. Compared with methods that use the same input data, the proposed ST-TR achieves state-of-the-art performance on all datasets when using joints’ coordinates as input, and results on-par with state-of-the-art when adding bones information.



中文翻译:

通过空间和时间转换器网络进行基于骨骼的动作识别

近年来,基于骨骼的人体活动识别引起了极大的兴趣,因为骨骼数据已显示出对光照变化,人体比例,动态摄影机视角和复杂背景的鲁棒性。特别是,时空图卷积网络(ST-GCN)被证明可有效地学习对非欧几里得数据(如骨架图)的时空依赖性。然而,有效编码3D骨骼下面的潜在信息仍然是一个未解决的问题,尤其是在从关节运动模式及其相关性中提取有效信息时。在这项工作中,我们提出了一种新颖的时空变压器网络(ST-TR),该网络使用变压器的自注意力对关节之间的依赖性进行建模操作员。在我们的ST-TR模型中,空间自我关注模块(SSA)用于了解不同身体部位之间的帧内交互,而时间自我关注模块(TSA)用于为帧间关联建模。两者结合在一个两流网络中,该网络的性能在三个大型数据集NTU-RGB + D 60,NTU-RGB + D 120和Kinetics Skeleton 400上进行评估,从而不断改善了主干结果。与使用相同输入数据的方法相比,当使用关节的坐标作为输入时,建议的ST-TR在所有数据集上都具有最新的性能,并且在添加时可以与最新的结果相提并论。骨骼信息。

更新日期:2021-05-14
down
wechat
bug