Skip to main content

Advertisement

Log in

Real-time spatial normalization for dynamic gesture classification

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In this paper, we provide a new spatial data generalization method which we applied in hand gesture recognition tasks. Data gathering can be a tedious task when it comes to gesture recognition, especially dynamic gestures. Nowadays, the standard solutions when lacking data still consist of either the expensive gathering of new data or the impractical employment of hand-crafted data augmentation algorithms. While these solutions may show improvement, they come with disadvantages. We believe that a better extrapolation of the limited data’s common pattern, through an improved generalization, should first be considered. We, therefore, propose a dynamic generalization method that allows to capture and normalize in real-time the spatial evolution of the input. The latter procedure can be fully converted into a neural network processing layer which we call Evolution Normalization Layer. Experimental results on the SHREC2017 dataset showed that the addition of the proposed layer improved the prediction accuracy of a standard sequence-processing model while requiring 6 times fewer weights on average for a similar score. Furthermore, when trained on only 10% of the original training data, the standard model was able to reach a maximum accuracy of only 36.5% alone and 56.8% when applying a state-of-the-art processing method to the data, whereas the addition of our layer alone permitted to achieve a prediction accuracy of 81.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. Software available from tensorflow.org

  2. Ameur, S., Khalifa, A.B., Bouhlel, M.S.: A novel hybrid bidirectional unidirectional lstm network for dynamic hand gesture recognition with leap motion. Entertain. Comput. 35, 100373 (2020)

    Article  Google Scholar 

  3. Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1067–1076 (2019)

  4. Bar-Hillel, A., Krupka, E., Bloom, N.: Convolutional tables ensemble: classification in microseconds. arXiv:1602.04489 (2016)

  5. Chen, X., Guo, H., Wang, G., Zhang, L.: Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: IEEE International Conference on Image Processing, pp. 2881–2885 (2017)

  6. Chen, Y., Tu, Z., Ge, L., Zhang, D., Chen, R., Yuan, J.: SO-Handnet: Self-organizing network for 3D hand pose estimation with semi-supervised learning. In: IEEE International Conference on Computer Vision, pp. 6960–6969 (2019)

  7. Chen, Y., Zhao, L., Peng, X., Yuan, J., Metaxas, D.N.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv preprint arXiv:1907.08871 (2019)

  8. Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras

  9. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  10. De Smedt, Q., Wannous, H., Vandeborre, J.P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)

  11. De Smedt, Q., Wannous, H., Vandeborre, J.P., Guerry, J., Le Saux, B., Filliat, D.: Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)

  12. Devineau, G., Moutarde, F., Xi, W., Yang, J.: Deep learning for hand gesture recognition on skeletal data. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, pp. 106–113 (2018)

  13. Gao, Y., Wang, Y., Falco, P., Navab, N., Tombari, F.: Variational object-aware 3-d hand pose from a single rgb image. IEEE Robot. Autom. Lett. 4, 4239–4246 (2019)

    Article  Google Scholar 

  14. Hakim, N.L., Shih, T.K., Kasthuri Arachchi, S.P., Aditya, W., Chen, Y.C., Lin, C.Y.: Dynamic hand gesture recognition using 3dcnn and lstm with fsm context-aware model. Sensors 19(24), 5429 (2019)

    Article  Google Scholar 

  15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  16. Huang, G., Yan, Q.: Optimizing features quality: a normalized covariance fusion framework for skeleton action recognition. IEEE Access 8, 211869–211881 (2020)

    Article  Google Scholar 

  17. Intel: Realsense\(^{\rm TM}\) sdk for windows. https://software.intel.com/en-us/realsense-sdk-windows-eol. Accessed on 10/20/2019

  18. Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Real time hand pose estimation using depth sensors. In: Consumer Depth Cameras for Computer Vision. Springer, pp. 119–137 (2013)

  19. Keskin, C., Kiraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: ECCV (2012)

  20. Krupka, E., Karmon, K., Bloom, N., Freedman, D., Gurvich, I., Hurvitz, A., Leichter, I., Smolin, Y., Tzairi, Y., Vinnikov, A., Bar-Hillel, A.: Toward realistic hands gesture interface: Keeping it simple for developers and machines. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI’17. ACM, New York, NY, USA, pp. 1887–1898 (2017). https://doi.org/10.1145/3025453.3025508. http://doi.acm.org/10.1145/3025453.3025508

  21. Krupka, E., Vinnikov, A., Klein, B., Bar-Hillel, A., Freedman, D., Stachniak, S.: Discriminative ferns ensemble for hand pose recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3670–3677 (2014)

  22. Kwon, B., Lee, S.: Human skeleton data augmentation for person identification over deep neural network. Appl. Sci. 10(14), 4849 (2020)

    Article  Google Scholar 

  23. Li, J., Yang, M., Liu, Y., Wang, Y., Zheng, Q., Wang, D.: Dynamic hand gesture recognition using multi-direction 3D convolutional neural networks. Eng. Lett. 27(3), 490–500 (2019)

    Google Scholar 

  24. Li, Y., He, Z., Ye, X., He, Z., Han, K.: Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition. EURASIP J. Image Video Process. 2019(1), 78 (2019)

    Article  Google Scholar 

  25. Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M.G., Lee, J., Chang, W.T., Hua, W., Georg, M., Grundmann, M.: MediaPipe: A framework for building perception pipelines. CoRR abs/1906.08172 (2019)

  26. Min, Y., Chai, X., Zhao, L., Chen, X.: Flickernet: adaptive 3d gesture recognition from sparse point clouds. In: BMVC, p. 105 (2019)

  27. Min, Y., Zhang, Y., Chai, X., Chen, X.: An efficient pointlstm for point clouds based gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5761–5770 (2020)

  28. Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: Ganerated hands for real-time 3d hand tracking from monocular rgb. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 49–59 (2017)

  29. Nguyen, X.S., Brun, L., Lezoray, O., Bougleux, S.: Skeleton-based hand gesture recognition by learning spd matrices with neural networks. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). IEEE, pp. 1–5 (2019)

  30. Nunez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)

    Article  Google Scholar 

  31. Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv:1502.06807 (2015)

  32. Oberweger, M., Wohlhart, P., Lepetit, V.: Training a feedback loop for hand pose estimation. In: 2015 IEEE International Conference on Computer Vision (ICCV) pp. 3316–3324 (2015)

  33. Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Tracking the articulated motion of two strongly interacting hands. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1862–1869 (2012)

  34. Rehg, J.M., Kanade, T.: Digiteyes: vision-based hand tracking for human-computer interaction. In: Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects, pp. 16–22 (1994)

  35. Salami, D., Palipana, S., Kodali, M., Sigg, S.: Motion pattern recognition in 4d point clouds. In: 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, pp. 1–6 (2020)

  36. Sharp, T., Keskin, C., Robertson, D.P., Taylor, J., Shotton, J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei, Y., Freedman, D., Kohli, P., Krupka, E., Fitzgibbon, A.W., Izadi, S.: Accurate, robust, and flexible real-time hand tracking. In: CHI (2015)

  37. Simon, T., Joo, H., Matthews, I.A., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4645–4653 (2017)

  38. Song, J., Sörös, G., Pece, F., Fanello, S.R., Izadi, S., Keskin, C., Hilliges, O.: In-air gestures around unmodified mobile devices. In: UIST (2014)

  39. Sridhar, S., Mueller, F., Oulasvirta, A., Theobalt, C.: Fast and robust hand tracking using detection-guided optimization. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3221 (2015)

  40. Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using rgb and depth data. In: 2013 IEEE International Conference on Computer Vision, pp. 2456–2463 (2013)

  41. Sridhar, S., Rhodin, H., Seidel, H.P., Oulasvirta, A., Theobalt, C.: Real-time hand tracking using a sum of anisotropic gaussians model. In: Proceedings of the International Conference on 3D Vision (3DV) (2014). http://handtracker.mpi-inf.mpg.de/projects/ellipsoidtracker_3dv2014/

  42. Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 824–832 (2015)

  43. Tang, D., Chang, H.J., Tejani, A., Kim, T.K.: Latent regression forest: structured estimation of 3d articulated hand posture. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)

  44. Tompson, J., Stein, M., LeCun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. 33, 169:1–169:10 (2014)

    Article  Google Scholar 

  45. Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. Int. J. Comput. Vis. 118, 172–193 (2016)

  46. Wan, C., Yao, A., Gool, L.V.: Hand pose estimation from local surface normals. In: ECCV (2016)

  47. Wang, R.Y., Popovic, J.: Real-time hand-tracking with a color glove. In: SIGGRAPH 2009 (2009)

  48. Wu, Y., Zheng, B., Zhao, Y.: Dynamic gesture recognition based on lstm-cnn. In: 2018 Chinese Automation Congress (CAC). IEEE, pp. 2446–2450 (2018)

  49. Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: 2013 IEEE International Conference on Computer Vision, pp. 3456–3462 (2013)

  50. Ye, Q., Yuan, S., Kim, T.K.: Spatial attention deep net with partial pso for hierarchical hybrid hand pose estimation. In: ECCV (2016)

  51. Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: Freihand: A dataset for markerless capture of hand pose and shape from single rgb images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 813–822 (2019)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants 62077037 and 61872241, in part by Shanghai Municipal Science and Technology Major Project under Grant 2021SHZDZX0102, in part by the Science and Technology Commission of Shanghai Municipality under Grants 18410750700 and 17411952600, and in part by Project of Shanghai Municipal Health Commission (2018ZHYL0230), and in part by The Hong Kong Polytechnic University under Grants P0030419, P0030929, and P0035358.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bin Sheng or Lijuan Mao.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 2737 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeghoud, S., Ali, S.G., Ertugrul, E. et al. Real-time spatial normalization for dynamic gesture classification. Vis Comput 38, 1345–1357 (2022). https://doi.org/10.1007/s00371-021-02229-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02229-9

Keywords

Navigation