当前位置: X-MOL 学术J. Comput. Sci. Tech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition
Journal of Computer Science and Technology ( IF 1.9 ) Pub Date : 2020-05-01 , DOI: 10.1007/s11390-020-0405-6
Jin-Gong Jia , Yuan-Feng Zhou , Xing-Wei Hao , Feng Li , Christian Desrosiers , Cai-Ming Zhang

With the growing popularity of somatosensory interaction devices, human action recognition is becoming attractive in many application scenarios. Skeleton-based action recognition is effective because the skeleton can represent the position and the structure of key points of the human body. In this paper, we leverage spatiotemporal vectors between skeleton sequences as input feature representation of the network, which is more sensitive to changes of the human skeleton compared with representations based on distance and angle features. In addition, we redesign residual blocks that have different strides in the depth of the network to improve the processing ability of the temporal convolutional networks (TCNs) for long time dependent actions. In this work, we propose the two-stream temporal convolutional networks (TS-TCNs) that take full advantage of the inter-frame vector feature and the intra-frame vector feature of skeleton sequences in the spatiotemporal representations. The framework can integrate different feature representations of skeleton sequences so that the two feature representations can make up for each other’s shortcomings. The fusion loss function is used to supervise the training parameters of the two branch networks. Experiments on public datasets show that our network achieves superior performance and attains an improvement of 1.2% over the recent GCN-based (BGC-LSTM) method on the NTU RGB+D dataset.

中文翻译:

用于基于骨架的人类行为识别的双流时间卷积网络

随着体感交互设备的日益普及,人体动作识别在许多应用场景中变得越来越有吸引力。基于骨骼的动作识别是有效的,因为骨骼可以表示人体关键点的位置和结构。在本文中,我们利用骨架序列之间的时空向量作为网络的输入特征表示,与基于距离和角度特征的表示相比,它对人体骨骼的变化更敏感。此外,我们重新设计了在网络深度上具有不同步幅的残差块,以提高时间卷积网络 (TCN) 对长时间依赖动作的处理能力。在这项工作中,我们提出了两流时间卷积网络(TS-TCNs),它充分利用了时空表示中骨架序列的帧间矢量特征和帧内矢量特征。该框架可以整合骨架序列的不同特征表示,使两种特征表示可以相互弥补不足。融合损失函数用于监督两个分支网络的训练参数。在公共数据集上的实验表明,我们的网络在 NTU RGB+D 数据集上实现了卓越的性能,并且比最近的基于 GCN(BGC-LSTM)的方法提高了 1.2%。该框架可以整合骨架序列的不同特征表示,使两种特征表示可以相互弥补不足。融合损失函数用于监督两个分支网络的训练参数。在公共数据集上的实验表明,我们的网络在 NTU RGB+D 数据集上实现了卓越的性能,并且比最近的基于 GCN(BGC-LSTM)的方法提高了 1.2%。该框架可以整合骨架序列的不同特征表示,使两种特征表示可以相互弥补不足。融合损失函数用于监督两个分支网络的训练参数。在公共数据集上的实验表明,我们的网络在 NTU RGB+D 数据集上实现了卓越的性能,并且比最近的基于 GCN(BGC-LSTM)的方法提高了 1.2%。
更新日期:2020-05-01
down
wechat
bug