STA-GCN: two-stream graph convolutional network with spatial–temporal attention for hand gesture recognition,The Visual Computer

当前位置： X-MOL 学术 › Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

STA-GCN: two-stream graph convolutional network with spatial–temporal attention for hand gesture recognition
The Visual Computer ( IF 3.0 ) Pub Date : 2020-08-28 , DOI: 10.1007/s00371-020-01955-w
Wei Zhang , Zeyi Lin , Jian Cheng , Cuixia Ma , Xiaoming Deng , Hongan Wang

Skeleton-based hand gesture recognition is an active research topic in computer graphics and computer vision and has a wide range of applications in VR/AR and robotics. Although the spatial–temporal graph convolutional network has been successfully used in skeleton-based hand gesture recognition, these works often use a fixed spatial graph according to the hand skeleton tree or use a fixed graph on the temporal dimension, which may not be optimal for hand gesture recognition. In this paper, we propose a two-stream graph attention convolutional network with spatial–temporal attention for hand gesture recognition. We adopt pose stream and motion stream as the two input streams for our network. In pose stream, we use the joint in each frame as the input; In motion stream, we use the joint offsets between neighboring frames as the input. We propose a new temporal graph attention module to model the temporal dependency and also use a spatial graph attention module to construct dynamic skeleton graph. For each stream, we adopt graph convolutional network with spatial–temporal attention to extract the features. Then, we concatenate the feature of the pose stream and motion stream for gesture recognition. We achieve the competitive performance on the main hand gesture recognition benchmark datasets, which demonstrates the effectiveness of our method.

中文翻译：

STA-GCN：用于手势识别的具有时空注意的双流图卷积网络

基于骨架的手势识别是计算机图形学和计算机视觉领域的一个活跃研究课题，在 VR/AR 和机器人技术中有着广泛的应用。尽管时空图卷积网络已成功用于基于骨骼的手势识别，但这些工作通常根据手部骨骼树使用固定空间图或使用时间维度上的固定图，这可能不是最佳的手势识别。在本文中，我们提出了一种用于手势识别的具有时空注意力的双流图注意力卷积网络。我们采用姿势流和运动流作为我们网络的两个输入流。在pose流中，我们使用每一帧的关节作为输入；在运动流中，我们使用相邻帧之间的联合偏移作为输入。我们提出了一个新的时间图注意模块来对时间依赖性进行建模，并使用空间图注意模块来构建动态骨架图。对于每个流，我们采用具有时空注意力的图卷积网络来提取特征。然后，我们将姿势流和运动流的特征连接起来进行手势识别。我们在主要的手势识别基准数据集上取得了有竞争力的表现，这证明了我们方法的有效性。我们将姿势流和运动流的特征连接起来进行手势识别。我们在主要的手势识别基准数据集上取得了有竞争力的表现，这证明了我们方法的有效性。我们将姿势流和运动流的特征连接起来进行手势识别。我们在主要的手势识别基准数据集上取得了有竞争力的表现，这证明了我们方法的有效性。

更新日期：2020-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文