Real-time spatial normalization for dynamic gesture classification

Zeghoud, Sofiane; Ali, Saba Ghazanfar; Ertugrul, Egemen; Kamel, Aouaidjia; Sheng, Bin; Li, Ping; Chi, Xiaoyu; Kim, Jinman; Mao, Lijuan

doi:10.1007/s00371-021-02229-9

Real-time spatial normalization for dynamic gesture classification

Original article
Published: 17 July 2021

Volume 38, pages 1345–1357, (2022)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Sofiane Zeghoud¹,
Saba Ghazanfar Ali¹,
Egemen Ertugrul¹,
Aouaidjia Kamel²,
Bin Sheng ORCID: orcid.org/0000-0001-8510-2556¹,
Ping Li³,
Xiaoyu Chi⁴,
Jinman Kim⁵ &
…
Lijuan Mao⁶

540 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, we provide a new spatial data generalization method which we applied in hand gesture recognition tasks. Data gathering can be a tedious task when it comes to gesture recognition, especially dynamic gestures. Nowadays, the standard solutions when lacking data still consist of either the expensive gathering of new data or the impractical employment of hand-crafted data augmentation algorithms. While these solutions may show improvement, they come with disadvantages. We believe that a better extrapolation of the limited data’s common pattern, through an improved generalization, should first be considered. We, therefore, propose a dynamic generalization method that allows to capture and normalize in real-time the spatial evolution of the input. The latter procedure can be fully converted into a neural network processing layer which we call Evolution Normalization Layer. Experimental results on the SHREC2017 dataset showed that the addition of the proposed layer improved the prediction accuracy of a standard sequence-processing model while requiring 6 times fewer weights on average for a similar score. Furthermore, when trained on only 10% of the original training data, the standard model was able to reach a maximum accuracy of only 36.5% alone and 56.8% when applying a state-of-the-art processing method to the data, whereas the addition of our layer alone permitted to achieve a prediction accuracy of 81.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 3

Real-time continuous detection and recognition of dynamic hand gestures in untrimmed sequences based on end-to-end architecture with 3D DenseNet and LSTM

Article 14 July 2023

Zhi Lu, Shiyin Qin, … Bo Tang

Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model

Article 14 May 2019

Jinghua Li, Huarui Huai, … Lichun Wang

Robustness of Deep LSTM Networks in Freehand Gesture Recognition

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. Software available from tensorflow.org
Ameur, S., Khalifa, A.B., Bouhlel, M.S.: A novel hybrid bidirectional unidirectional lstm network for dynamic hand gesture recognition with leap motion. Entertain. Comput. 35, 100373 (2020)
Article Google Scholar
Baek, S., Kim, K.I., Kim, T.K.: Pushing the envelope for rgb-based dense 3d hand pose estimation via neural rendering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1067–1076 (2019)
Bar-Hillel, A., Krupka, E., Bloom, N.: Convolutional tables ensemble: classification in microseconds. arXiv:1602.04489 (2016)
Chen, X., Guo, H., Wang, G., Zhang, L.: Motion feature augmented recurrent neural network for skeleton-based dynamic hand gesture recognition. In: IEEE International Conference on Image Processing, pp. 2881–2885 (2017)
Chen, Y., Tu, Z., Ge, L., Zhang, D., Chen, R., Yuan, J.: SO-Handnet: Self-organizing network for 3D hand pose estimation with semi-supervised learning. In: IEEE International Conference on Computer Vision, pp. 6960–6969 (2019)
Chen, Y., Zhao, L., Peng, X., Yuan, J., Metaxas, D.N.: Construct dynamic graphs for hand gesture recognition via spatial-temporal attention. arXiv preprint arXiv:1907.08871 (2019)
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
De Smedt, Q., Wannous, H., Vandeborre, J.P.: Skeleton-based dynamic hand gesture recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–9 (2016)
De Smedt, Q., Wannous, H., Vandeborre, J.P., Guerry, J., Le Saux, B., Filliat, D.: Shrec’17 track: 3d hand gesture recognition using a depth and skeletal dataset. In: 3DOR-10th Eurographics Workshop on 3D Object Retrieval, pp. 1–6 (2017)
Devineau, G., Moutarde, F., Xi, W., Yang, J.: Deep learning for hand gesture recognition on skeletal data. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, pp. 106–113 (2018)
Gao, Y., Wang, Y., Falco, P., Navab, N., Tombari, F.: Variational object-aware 3-d hand pose from a single rgb image. IEEE Robot. Autom. Lett. 4, 4239–4246 (2019)
Article Google Scholar
Hakim, N.L., Shih, T.K., Kasthuri Arachchi, S.P., Aditya, W., Chen, Y.C., Lin, C.Y.: Dynamic hand gesture recognition using 3dcnn and lstm with fsm context-aware model. Sensors 19(24), 5429 (2019)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, G., Yan, Q.: Optimizing features quality: a normalized covariance fusion framework for skeleton action recognition. IEEE Access 8, 211869–211881 (2020)
Article Google Scholar
Intel: Realsense\(^{\rm TM}\) sdk for windows. https://software.intel.com/en-us/realsense-sdk-windows-eol. Accessed on 10/20/2019
Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Real time hand pose estimation using depth sensors. In: Consumer Depth Cameras for Computer Vision. Springer, pp. 119–137 (2013)
Keskin, C., Kiraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: ECCV (2012)
Krupka, E., Karmon, K., Bloom, N., Freedman, D., Gurvich, I., Hurvitz, A., Leichter, I., Smolin, Y., Tzairi, Y., Vinnikov, A., Bar-Hillel, A.: Toward realistic hands gesture interface: Keeping it simple for developers and machines. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI’17. ACM, New York, NY, USA, pp. 1887–1898 (2017). https://doi.org/10.1145/3025453.3025508. http://doi.acm.org/10.1145/3025453.3025508
Krupka, E., Vinnikov, A., Klein, B., Bar-Hillel, A., Freedman, D., Stachniak, S.: Discriminative ferns ensemble for hand pose recognition. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3670–3677 (2014)
Kwon, B., Lee, S.: Human skeleton data augmentation for person identification over deep neural network. Appl. Sci. 10(14), 4849 (2020)
Article Google Scholar
Li, J., Yang, M., Liu, Y., Wang, Y., Zheng, Q., Wang, D.: Dynamic hand gesture recognition using multi-direction 3D convolutional neural networks. Eng. Lett. 27(3), 490–500 (2019)
Google Scholar
Li, Y., He, Z., Ye, X., He, Z., Han, K.: Spatial temporal graph convolutional networks for skeleton-based dynamic hand gesture recognition. EURASIP J. Image Video Process. 2019(1), 78 (2019)
Article Google Scholar
Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M.G., Lee, J., Chang, W.T., Hua, W., Georg, M., Grundmann, M.: MediaPipe: A framework for building perception pipelines. CoRR abs/1906.08172 (2019)
Min, Y., Chai, X., Zhao, L., Chen, X.: Flickernet: adaptive 3d gesture recognition from sparse point clouds. In: BMVC, p. 105 (2019)
Min, Y., Zhang, Y., Chai, X., Chen, X.: An efficient pointlstm for point clouds based gesture recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5761–5770 (2020)
Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., Theobalt, C.: Ganerated hands for real-time 3d hand tracking from monocular rgb. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 49–59 (2017)
Nguyen, X.S., Brun, L., Lezoray, O., Bougleux, S.: Skeleton-based hand gesture recognition by learning spd matrices with neural networks. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019). IEEE, pp. 1–5 (2019)
Nunez, J.C., Cabido, R., Pantrigo, J.J., Montemayor, A.S., Velez, J.F.: Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recogn. 76, 80–94 (2018)
Article Google Scholar
Oberweger, M., Wohlhart, P., Lepetit, V.: Hands deep in deep learning for hand pose estimation. arXiv:1502.06807 (2015)
Oberweger, M., Wohlhart, P., Lepetit, V.: Training a feedback loop for hand pose estimation. In: 2015 IEEE International Conference on Computer Vision (ICCV) pp. 3316–3324 (2015)
Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Tracking the articulated motion of two strongly interacting hands. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1862–1869 (2012)
Rehg, J.M., Kanade, T.: Digiteyes: vision-based hand tracking for human-computer interaction. In: Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects, pp. 16–22 (1994)
Salami, D., Palipana, S., Kodali, M., Sigg, S.: Motion pattern recognition in 4d point clouds. In: 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, pp. 1–6 (2020)
Sharp, T., Keskin, C., Robertson, D.P., Taylor, J., Shotton, J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei, Y., Freedman, D., Kohli, P., Krupka, E., Fitzgibbon, A.W., Izadi, S.: Accurate, robust, and flexible real-time hand tracking. In: CHI (2015)
Simon, T., Joo, H., Matthews, I.A., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4645–4653 (2017)
Song, J., Sörös, G., Pece, F., Fanello, S.R., Izadi, S., Keskin, C., Hilliges, O.: In-air gestures around unmodified mobile devices. In: UIST (2014)
Sridhar, S., Mueller, F., Oulasvirta, A., Theobalt, C.: Fast and robust hand tracking using detection-guided optimization. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3221 (2015)
Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using rgb and depth data. In: 2013 IEEE International Conference on Computer Vision, pp. 2456–2463 (2013)
Sridhar, S., Rhodin, H., Seidel, H.P., Oulasvirta, A., Theobalt, C.: Real-time hand tracking using a sum of anisotropic gaussians model. In: Proceedings of the International Conference on 3D Vision (3DV) (2014). http://handtracker.mpi-inf.mpg.de/projects/ellipsoidtracker_3dv2014/
Sun, X., Wei, Y., Liang, S., Tang, X., Sun, J.: Cascaded hand pose regression. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 824–832 (2015)
Tang, D., Chang, H.J., Tejani, A., Kim, T.K.: Latent regression forest: structured estimation of 3d articulated hand posture. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3786–3793 (2014)
Tompson, J., Stein, M., LeCun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. 33, 169:1–169:10 (2014)
Article Google Scholar
Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.: Capturing hands in action using discriminative salient points and physics simulation. Int. J. Comput. Vis. 118, 172–193 (2016)
Wan, C., Yao, A., Gool, L.V.: Hand pose estimation from local surface normals. In: ECCV (2016)
Wang, R.Y., Popovic, J.: Real-time hand-tracking with a color glove. In: SIGGRAPH 2009 (2009)
Wu, Y., Zheng, B., Zhao, Y.: Dynamic gesture recognition based on lstm-cnn. In: 2018 Chinese Automation Congress (CAC). IEEE, pp. 2446–2450 (2018)
Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: 2013 IEEE International Conference on Computer Vision, pp. 3456–3462 (2013)
Ye, Q., Yuan, S., Kim, T.K.: Spatial attention deep net with partial pso for hierarchical hybrid hand pose estimation. In: ECCV (2016)
Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., Brox, T.: Freihand: A dataset for markerless capture of hand pose and shape from single rgb images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 813–822 (2019)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grants 62077037 and 61872241, in part by Shanghai Municipal Science and Technology Major Project under Grant 2021SHZDZX0102, in part by the Science and Technology Commission of Shanghai Municipality under Grants 18410750700 and 17411952600, and in part by Project of Shanghai Municipal Health Commission (2018ZHYL0230), and in part by The Hong Kong Polytechnic University under Grants P0030419, P0030929, and P0035358.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Sofiane Zeghoud, Saba Ghazanfar Ali, Egemen Ertugrul & Bin Sheng
Space Techniques Center, Algerian Space Agency, Arzew, Algeria
Aouaidjia Kamel
Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Ping Li
Qingdao Research Institute, Beihang University, Qingdao, China
Xiaoyu Chi
School of Information Technologies, The University of Sydney, Sydney, Australia
Jinman Kim
School of Physical Education and Training, Shanghai University of Sport, Shanghai, China
Lijuan Mao

Authors

Sofiane Zeghoud
View author publications
You can also search for this author in PubMed Google Scholar
Saba Ghazanfar Ali
View author publications
You can also search for this author in PubMed Google Scholar
Egemen Ertugrul
View author publications
You can also search for this author in PubMed Google Scholar
Aouaidjia Kamel
View author publications
You can also search for this author in PubMed Google Scholar
Bin Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Ping Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Chi
View author publications
You can also search for this author in PubMed Google Scholar
Jinman Kim
View author publications
You can also search for this author in PubMed Google Scholar
Lijuan Mao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bin Sheng or Lijuan Mao.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 2737 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeghoud, S., Ali, S.G., Ertugrul, E. et al. Real-time spatial normalization for dynamic gesture classification. Vis Comput 38, 1345–1357 (2022). https://doi.org/10.1007/s00371-021-02229-9

Download citation

Accepted: 27 June 2021
Published: 17 July 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00371-021-02229-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Real-time spatial normalization for dynamic gesture classification

Abstract

Access this article

Similar content being viewed by others

Real-time continuous detection and recognition of dynamic hand gestures in untrimmed sequences based on end-to-end architecture with 3D DenseNet and LSTM

Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model

Robustness of Deep LSTM Networks in Freehand Gesture Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-time spatial normalization for dynamic gesture classification

Abstract

Access this article

Similar content being viewed by others

Real-time continuous detection and recognition of dynamic hand gestures in untrimmed sequences based on end-to-end architecture with 3D DenseNet and LSTM

Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model

Robustness of Deep LSTM Networks in Freehand Gesture Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation