Skip to main content
Log in

Fast and robust key frame extraction method for gesture video based on high-level feature representation

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In gesture video, the inner-frame difference is too subtle to be projected via low-level features, and the gesture frames, expressing semantic information, are distributed only among the tiny part of the whole video frame. This paper introduces a fast and robust key frame extraction method for gesture video, founded upon high-level feature representation to extract the gesture key frame precisely without affecting the semantic information. Firstly, a gesture video segmentation model is designed by employing SSD, which classify gesture video into the semantic scene and the static scene. And then, the 2D-DWT-based perceptual hash algorithm is studied to extract candidate static key frames. Afterward, the multi-channel gradient magnitude frequency histogram (HGMF-MC) based on improved VGG16 is developed as a new image descriptor. Finally, a key frame extraction mechanism based on HGMF-MC is proposed to generate gesture video summary of two scenes, respectively. Experiments consistently show the superiority of the proposed method on Chinese sign language, Cambridge, ChaLearn and CVRR-Hands gesture datasets. The results demonstrate that the method proposed is effective, which improves the video compression ratio and outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Raikwar, S.C., Bhatnagar, C., Jalal, A.S.: A framework for key frame extraction from surveillance video. In: Proceedings: 5th IEEE International Conference on Computer and Communication Technology, ICCCT 2014 (2015)

  2. Gharbi, H., Bahroun, S., Massaoudi, M., Zagrouba, E.: Key frames extraction using graph modularity clustering for efficient video summarization. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing: Proceedings (2017)

  3. Gharbi, H., Bahroun, S., Zagrouba, E.: Key frame extraction for video summarization using local description and repeatability graph clustering. Signal Image Video Process. 13, 507–515 (2019)

    Article  Google Scholar 

  4. Tang, H., Liu, H., Xiao, W., Sebe, N.: Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion. Neurocomputing 331, 424–433 (2019)

    Article  Google Scholar 

  5. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems (2014)

  6. Kulhare, S., Sah, S., Pillai, S., Ptucha, R.: Key frame extraction for salient activity recognition. In: Proceedings: International Conference on Pattern Recognition (2016)

  7. Xia, G., Sun, H., Niu, X., Zhang, G., Feng, L.: Keyframe extraction for human motion capture data based on joint kernel sparse representation. IEEE Trans. Ind. Electron. 64, 1589–1599 (2017)

    Article  Google Scholar 

  8. Li, Y., Tan, B., Ding, S., Paik, I., Kanemura, A.: Key frame extraction from video based on determinant-type of sparse measure and DC Programming. In: Proceedings: IEEE 11th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2017 (2018)

  9. Li, X., Zhao, B., Lu, X.: Key frame extraction in the summary space. In: IEEE Trans. Cybern (2018)

  10. Huang, C., Wang, H.: A novel key-frames selection framework for comprehensive video summarization. In: IEEE Trans. Circuits Syst. Video Technol (2020)

  11. Muneeb ul Hassan: VGG16: convolutional network for classification and detection (2018)

  12. Jeong, J., Park, H., Kwak, N.: Enhancement of SSD by concatenating feature maps for object detection. In: British Machine Vision Conference 2017, BMVC 2017 (2017)

  13. Yamauchi, Y., Matsushima, C., Yamashita, T., Fujiyoshi, H.: Relational HOG feature with wild-card for object detection. In: Proceedings of the IEEE International Conference on Computer Vision (2011)

  14. Xiao, Y., Wu, J., Yuan, J.: MCENTRIST: a multi-channel feature generation mechanism for scene categorization. IEEE Trans. Image Process. 23, 823–836 (2014)

    Article  MathSciNet  Google Scholar 

  15. Zhang, J., Zhou, W., Xie, C., Pu, J., Li, H.: Chinese sign language recognition with adaptive HMM. In: Proceedings: IEEE International Conference on Multimedia and Expo (2016)

  16. Kim, T.K., Wong, S.F., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2007)

  17. Guyon, I., Athitsos, V., Jangyodsuk, P., Escalante, H.J.: The ChaLearn gesture dataset (CGD 2011). Mach. Vis. Appl. 25, 1929–1951 (2014)

    Article  Google Scholar 

  18. Ohn-Bar, E., Trivedi, M.M.: The power is in your hands: 3D analysis of hand gestures in naturalistic video. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2013)

  19. Li, B., Han, L.: Distance weighted cosine similarity measure for text classification. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013)

  20. Vergara, J.R., Estévez, P.A.: A review of feature selection methods based on mutual information. Neural Comput Appl 24(1): 175–186 (2014)

  21. Ding, K., Chen, S., Meng, F.: A novel perceptual hash algorithm for multispectral image authentication. Algorithms 11, 6 (2018)

    Article  MathSciNet  Google Scholar 

  22. Pammer, K.: Temporal sampling in vision and the implications for dyslexia. Front. Hum. Neurosci. 7, 933 (2014)

    Article  Google Scholar 

  23. Moutinho, L., Hutcheson, G., Lin, F.-J.: Clustering algorithms. In: The SAGE Dictionary of Quantitative Management Research (2014)

  24. Chen, L., Wang, Y.: Automatic key frame extraction in continuous videos from construction monitoring by using color, texture, and gradient features. Autom. Constr. 81, 355–368 (2017)

    Article  Google Scholar 

  25. Hannane, R., Elboushaki, A., Afdel, K., Naghabhushan, P., Javed, M.: An efficient method for video shot boundary detection and keyframe extraction using SIFT-point distribution histogram. Int. J. Multimed. Inf. Retr. 5, 89–104 (2016)

    Article  Google Scholar 

  26. Sheena, C.V., Narayanan, N.K.: Key-frame extraction by analysis of histograms of video frames using statistical methods. In: Procedia Computer Science (2015)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (51405448). Qiuhong Tian acknowledges financial support from the doctoral research start-up funding of Zhejiang Sci-Tech University (18032117-Y). Qiaoli Zhuang acknowledges financial support from the doctoral research start-up funding of Zhejiang Sci-Tech University (19032141-Y). This work was also supported by Zhejiang University Student Science and Technology Achievement Promotion Project (14530031661961) and Zhejiang Sci-Tech University 2019 National University Students Innovation and Entrepreneurship Training Program (201910338012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiuhong Tian.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., Tian, Q., Zhuang, Q. et al. Fast and robust key frame extraction method for gesture video based on high-level feature representation. SIViP 15, 617–626 (2021). https://doi.org/10.1007/s11760-020-01783-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-020-01783-4

Keywords

Navigation