Skip to main content
Log in

STA-GCN: two-stream graph convolutional network with spatial–temporal attention for hand gesture recognition

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Skeleton-based hand gesture recognition is an active research topic in computer graphics and computer vision and has a wide range of applications in VR/AR and robotics. Although the spatial–temporal graph convolutional network has been successfully used in skeleton-based hand gesture recognition, these works often use a fixed spatial graph according to the hand skeleton tree or use a fixed graph on the temporal dimension, which may not be optimal for hand gesture recognition. In this paper, we propose a two-stream graph attention convolutional network with spatial–temporal attention for hand gesture recognition. We adopt pose stream and motion stream as the two input streams for our network. In pose stream, we use the joint in each frame as the input; In motion stream, we use the joint offsets between neighboring frames as the input. We propose a new temporal graph attention module to model the temporal dependency and also use a spatial graph attention module to construct dynamic skeleton graph. For each stream, we adopt graph convolutional network with spatial–temporal attention to extract the features. Then, we concatenate the feature of the pose stream and motion stream for gesture recognition. We achieve the competitive performance on the main hand gesture recognition benchmark datasets, which demonstrates the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. De Smedt, Q., Wannous, H., Vandeborre, J.-P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2016)

  2. De Smedt, Q., et al.: 3D hand gesture recognition using a depth and skeletal dataset: SHREC’17 track. In: Proceedings of the Workshop on 3D Object Retrieval. Eurographics Association (2017)

  3. De Smedt, Q., Wannous, H., Vandeborre, J.-P.: Heterogeneous hand gesture recognition using 3D dynamic skeletal data. In: Computer Vision and Image Understanding (2019)

  4. Hou, J., et al.: Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European Conference on Computer Vision (2018)

  5. Nguyen, X.S., et al.: A neural network based on SPD manifold learning for skeleton-based hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)

  6. Chen, Y., Zhao, L., Peng, X., Yuan, J., Metaxas, D.N.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. In: Proceedings of the British Machine Vision Conference (2019)

  7. De Smedt, Q., Wannous, H., Vandeborre, J.-P.: 3d hand gesture recognition by analysing set-of-joints trajectories. In: International Workshop on Understanding Human Activities through 3D Sensors (2017)

  8. Nunez, J.C, Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognition (2018)

  9. Weng, J., Liu, M., Jiang, X., Yuan, J.: Deformable pose traversal convolution for 3d action and gesture recognition. In: In Proceedings of the European Conference on Computer Vision (2018)

  10. Boulahia, S.Y., Anquetil, E., Multon, F., Kulpa, R.: Dynamic hand gesture recognition based on 3D pattern assembled trajectories. In: International Conference on Image Processing Theory, Tools and Applications (2017)

  11. Devineau, G., Moutarde, F., Xi, W., Yang, J.: Deep learning for hand gesture recognition on skeletal data. In: Proceedings of IEEE International Conference on Automatic Face Gesture Recognition (2018)

  12. Freeman, W.T, Roth, M.: Orientation histograms for hand gesture recognition. In: Proceedings of International Workshop on Automatic Face and Gesture Recognition (1995)

  13. Molchanov, P., Gupta, S., Kim, K., Kautz, J.: Hand gesture recognition with 3D convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2015)

  14. Wang, C., Liu, Z., Chan, S.-C.: Superpixel-based hand gesture recognition with kinect depth camera. In: IEEE Transactions on Multimedia (2015)

  15. Chen, X., Guo, H., Wang, G., Zhang, L.: Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: In Proceedings of the IEEE International Conference on Image Processing (2017)

  16. Núñez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S, Vélez, J. F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. In: Pattern Recognition (2018)

  17. Mehran, M., LaViola Jr, J.J.: DeepGRU: deep gesture recognition utility. In: In Proceedings of International Symposium on Visual Computing (2018)

  18. Oberweger, M., Lepetit, V.: Deepprior++: improving fast and accurate 3D hand pose estimation. In: In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  19. Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv preprint (2015)

  20. Oberweger, M., Wohlhart, P., Lepetit, V.: Training a feedback loop for hand pose estimation. In: In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

  21. Ohn-Bar, E., Trivedi, M.: Joint angles similarities and HOG2 for action recognition. In: In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2013)

  22. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: In Proceedings of the AAAI Conference on Artificial Intelligence (2018)

  23. Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)

  24. Fernando, B., Gavves, E., Oramas, J.M., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

  25. Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

  26. Shahroudy, A., Liu, J., Ng, T.-T., Wang, G.: NTU RGB+D: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  27. Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3d human action recognition. In: Proceedings of the European Conference on Computer Vision (2016)

  28. Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI Conference on Artificial Intelligence (2017)

  29. Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  30. Cao, C., Lan, C., Zhang, Y., Zeng, W., Lu, H., Zhang, Y.: Skeleton-based action recognition with gated convolutional neural networks. In: IEEE Transactions on Circuits and Systems for Video Technology (2018)

  31. Liu, H., Tu, J., Liu, M.: Two-stream 3d convolutional neural network for skeleton-based action recognition. arXiv preprint (2017)

  32. Kim, T.S., Reiter, A.: Interpretable 3d human action analysis with temporal convolutional networks. In: Proceedings of Computer Vision and Pattern Recognition Workshops (2017)

  33. Ke, Q., Bennamoun, M., An, S., Sohel, F.A., Boussad, F.: A new representation of skeleton sequences for 3d action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2017)

  34. Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. In: Pattern Recognition (2017)

  35. Li, C., Zhong, Q., Xie, D., Pu, S.: Skeleton-based action recognition with convolutional neural networks. In: In Proceedings of International Conference on Multimedia & Expo Workshops (2017)

  36. Zhang, X., Xu, C., Tian, X., Tao, D.: Graph edge convolutional neural networks for skeleton-based action recognition. In: IEEE Transactions on Neural Networks and Learning Systems (2018)

  37. Gao, X., Hu, W., Tang, J., Liu, J., Guo, Z.: Optimized skeleton-based action recognition via sparsified graph regression. In: Proceedings of the 27th ACM International Conference on Multimedia (2019)

  38. Simonya, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems (2014)

  39. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization (2014). arXiv:1412.6980

  40. Wen, Y., Gao, L., Fu, H., Zhang, F.-L., Xia, S.: Graph CNNs with motif and variable temporal block for skeleton-based action recognition. AAAI (2019)

  41. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2019)

  42. Wang, P., Cao, Y., Shen, C., Liu, L., Shen, H.T.: Temporal pyramid pooling-based convolutional neural network for action recognition. In: IEEE Transactions on Circuits and Systems for Video Technology (2017)

  43. Lin, Z., Zhang, W., Deng, X., Ma, C., Wang, H.: Image-based pose representation for action recognition and hand gesture recognition. In: Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (2020)

Download references

Acknowledgements

This work was supported by the National Key Research and Development Plan (2016YFB1001200, 2018YFC0809300), Natural Science Foundation of China (61473276, 61872346), and Natural Science Foundation of Beijing (L182052).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiaoming Deng or Hongan Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Lin, Z., Cheng, J. et al. STA-GCN: two-stream graph convolutional network with spatial–temporal attention for hand gesture recognition. Vis Comput 36, 2433–2444 (2020). https://doi.org/10.1007/s00371-020-01955-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-01955-w

Keywords

Navigation