Human action recognition based on multi-scale feature maps from depth video sequences

Li, Chang; Huang, Qian; Li, Xing; Wu, Qianhan

doi:10.1007/s11042-021-11193-4

Human action recognition based on multi-scale feature maps from depth video sequences

Published: 24 July 2021

Volume 80, pages 32111–32130, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chang Li¹,
Qian Huang ORCID: orcid.org/0000-0001-5625-0402¹,
Xing Li¹ &
…
Qianhan Wu¹

Abstract

Human action recognition is an active research area in computer vision. Although great progress has been made, previous methods mostly recognize actions from depth video sequences at only one scale, and thus they often neglect multi-scale spatial changes that provide additional information in practical applications. In this paper, we present a novel framework with a multi-scale mechanism to improve scale diversity of motion features. We propose a multi-scale feature map called Laplacian pyramid depth motion images(LP-DMI). First, We employ depth motion images (DMI) as the templates to generate the multi-scale static representation of actions. Then, we caculate LP-DMI to enhance multi-scale dynamic information of motions and reduce redundant static information in human bodies. We further extract the multi-granularity descriptor called LP-DMI-HOG to provide more discriminative features. Finally, we utilize extreme learning machine (ELM) for action classification. The proposed method yeilds the recognition accuracy of 93.41%, 85.12%, 91.94% on the public MSRAction3D, UTD-MHAD and DHA dataset. Through extensive experiments, we prove that our method outperforms the state-of-the-art benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Real-Time Human Action Recognition Using DMMs-Based LBP and EOH Features

Gradient local auto-correlation features for depth human action recognition

Article Open access 07 April 2021

Mohammad Farhad Bulbul & Hazrat Ali

Human action recognition on depth dataset

Article 01 September 2015

Zan Gao, Hua Zhang, … Yanbing Xue

References

Alpatov A V, Rybina N, Trynov D Y, Vikhrov S P (2018) Scale-space theory application to investigate surface correlation properties. Mediterranean Conference on Embedded Computing (MECO), pp 1–3
Aly S, Sayed A (2019) Human action recognition using bag of global and local Zernike moment features. Multimed Tools Appl 78:24923–24953. https://doi.org/10.1007/s11042-019-7674-5
Article Google Scholar
Bobick A F, Davis J W (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267. https://doi.org/10.1109/34.910878
Article Google Scholar
Bulbul M F, Islam S, Ali H (2019) 3D human action analysis and recognition through GLAC descriptor on 2D motion and static posture images. Multimed Tools Appl 78(15):21085–21111. https://doi.org/10.1007/s11042-019-7365-2
Burt P, Adelson E (1987) The laplacian pyramid as a compact image code. IEEE Trans Commun 31(4):532–540. https://doi.org/10.1109/TCOM.1983.1095851
Article Google Scholar
Chen C, Hou Z, Zhang B, Jiang J, Yang Y (2015) Gradient local Auto-Correlations and extreme learning machine for Depth-Based activity recognition. Adv Vis Comput 9474:613–623. 978-3-319-27856-8
Article Google Scholar
Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. IEEE International Conference on Image Processing (ICIP). https://doi.org/10.1109/icip.2015.7350781
Chen C, Jafari R, Kehtarnavaz N (2015) Action recognition from depth sequences using depth motion Maps-Based local binary patterns. IEEE Winter Conf Appl Comput Vis:1092–1099
Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-Based Action recognition with shift graph convolutional network. IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle, 180–189. https://doi.org/10.1109/CVPR42600.2020.00026
Crasto N, Weinzaepfel P, Alahari K, Schmid C (2019) MARS: Motion-Augmented RGB stream for action recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 7874–7883
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), pp 886–893
Dhiman C, Vishwakarma DK (2018) A review of state-of-the-art techniques for abnormal human activity recognition. Eng Appl Artif Intell 77:21–45
Article Google Scholar
Elmadany NED, He Y, Guan L (2018) Information Fusion for Human Action Recognition via Biset/Multiset Globality Locality Preserving Canonical Correlation Analysis, in IEEE Transactions on Image Processing, 27(11):5275–5287. https://doi.org/10.1109/TIP.2018.2855438
Gu Y, Ye X, Sheng W (2018) Depth MHI Based Deep Learning Model for Human Action Recognition. 13th World Congress on Intelligent Control and Automation (WCICA), pp 395–400
Hou CX, Liang Z, Jiuzhen Yang T (2020) Integrally Cooperative Spatio-Temporal Feature Representation of Motion Joints for Action Recognition. Sensors (Basel, Switzerland). vol 20. https://doi.org/10.3390/s20185180
Hou Y, Wang S, Wang P, Gao Z, Li W (2018) Spatially and Temporally Structured Global to Local Aggregation of Dynamic Depth Information for Action Recognition. IEEE Access 6:2206–2219. https://doi.org/10.1109/ACCESS.2017.2782258
Article Google Scholar
Ji X, Cheng J, Feng W, Tao D (2017) Skeleton embedded motion body partition for human action recognition using depth sequences. Signal Process 143:56–68. https://doi.org/10.1016/j.sigpro.2017.08.016
Article Google Scholar
Kamel A, Sheng B, Yang P, Li P, Shen R, Feng D D (2019) Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Trans Syst Man Cybern Syst 49(9):1806–1819. https://doi.org/10.1109/TSMC.2018.2850149
Article Google Scholar
Kim H, Kim GY, Kim JY (2019) Music recommendation system using human activity recognition from accelerometer data. IEEE Trans Consum Electron 65(3):349–358. https://doi.org/10.1109/TCE.2019.2924177
Article Google Scholar
Li S, Hao Q, Kang X, Benediktsson J A (2018) Gaussian pyramid based multiscale feature fusion for hyperspectral image classification. Sel Top Appl Earth Observ Remote Sens 11(9):3312–3324. https://doi.org/10.1109/JSTARS.2018.2856741
Article Google Scholar
Li X, Hou Z, Liang J et al (2020) Human action recognition based on 3D body mask and depth spatial-temporal maps. Multimedia Tools and Applications
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. IEEE Comput Soc Conf Comput Vis Pattern Recogn:9–14
Li Z, Zheng Z, Lin F, et al. (2019) Action recognition from depth sequence using depth motion maps-based local ternary patterns and CNN. Multimedia Tools Appl 78:9587–19601. https://doi.org/10.1109/WACV.2015.150
Google Scholar
Min Y, Zhang Y, Xiujuan C, Xilin C (2020) An Efficient pointLSTM for Point Clouds Based Gesture Recognition. IEEE/CVF Conf Comput Vis Pattern Recogn:5761–5770
Nguyen X, Son M, Thanh A-I, et al. (2018) Action recognition in depth videos using hierarchical gaussian descriptor. Multimed Tools Appl 77 (16):21617–21652
Article Google Scholar
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/tpami.2002.1017623
Article Google Scholar
Oreifej O, Liu Z (2013) HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. IEEE Conf Comput Vis Pattern Recogn:716–723
Padilla-López JR, Chaaraoui AA, Flórez-Revuelta F (2014) A discussion on the validation tests employed to compare human action recognition methods using the msr action3d dataset. Computer Science
Peng W., Shi J, Zhao G. (2021) Spatial Temporal Graph Deconvolutional Network for Skeleton-based Human Action Recognition. IEEE Signal Processing Letters. https://doi.org/10.1109/LSP.2021.3049691
Rahmani H, Huynh D Q, Mahmood A, Ajmal M (2016) Discriminative human action classification using locality-constrained linear coding. Pattern Recogn Lett 72:62–71
Article Google Scholar
Sujee R, Padmavathi S (2018) Pyramid-based Image Interpolation. International Conference on Computer Communication and Informatics (ICCCI), pp 1–5
Sun B, Kong D, Wang S, Wang L, Wang Y, Yin B (2019) Effective human action recognition using global and local offsets of skeleton joints. Multimed Tools Appl 78:6329–6353. https://doi.org/10.1007/s11042-018-6370-1
Article Google Scholar
Tan Z, Xiao L, Chen S, Lv X (2020) Noise-Tolerant And Finite-Time convergent ZNN models for dynamic matrix Moore–Penrose inversion. IEEE Trans Indust Inf 16(3):1591–1601. https://doi.org/10.1109/TII.2019.2929055
Article Google Scholar
Teng Y, Liu F, Wu R (2013) The research of image detail enhancement algorithm with laplacian pyramid. IEEE international conference on green computing and communications and IEEE internet of things and IEEE cyber Physical and Social Computing, pp 2205–2209
Tian Y, Cao L, Liu Z, Zhang Z (2012) Hierarchical filtered motion for action recognition in crowded videos. IEEE Trans Syst Man Cybern 42 (3):313–323. https://doi.org/10.1109/TSMCC.2011.2149519
Article Google Scholar
Tran D T, Yamazoe H, Lee JH (2020) Multi-scale affined-HOF and dimension selection for view-unconstrained action recognition. Appl Intell 50(4):1468–1486. https://doi.org/10.1007/s10489-019-01572-8
Article Google Scholar
Trelinski J, Kwolek B (2019) Ensemble of classifiers using CNN and Hand-Crafted features for Depth-Based action recognition. Int Conf Artif Intell Soft Comput:91–103
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. IEEE Conf Comput Vis Pattern Recogn:588–595
Vieira A W, Nascimento E R, Oliveira G L, Liu Z, Campos M F (2012) Stop: Space-time occupancy patterns for 3d action recognition from depth map sequences. Iberoamerican Congress Pattern Recogn:252–259
Vieira A W, Nascimento E R, Oliveira G L, Liu Z, Campos M F (2014) On the improvement of human action recognition from depth map sequences using space-time occupancy patterns. Pattern Recogn Lett 36: 221–227
Vishwakarma DK, Kapoor R (2012) Simple and intelligent system to recognize the expression of speech-disabled person. 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), Kharagpur, pp 1–6
Vishwakarma D K, Kapoor R (2015) Integrated approach for human action recognition using edge spatial distribution, direction pixel and -transform. Adv Robot 29(23):1553–1562. https://doi.org/10.1080/01691864.2015.1061701
Vishwakarma DK, Kapoor R, Maheshwari R, Kapoor V, Raman S (2015) Recognition of abnormal human activity using the changes in orientation of silhouette in key frames. In: 2015 2nd International Conference on Computing for Sustainable Global Development. IEEE, pp 336–341
Vishwakarma DK, Kapoor R (2017) An efficient interpretation of hand gestures to control smart interactive television. Int J Comput Vis Robot 7(4):454–471
Article Google Scholar
Wan GY, Gai S, Yang Z (2017) Two-dimensional discriminant locality preserving projections (2ddlpp) and its application to feature extraction via fuzzy set. Multimedia Tools and Applications
Wan M, Yang G, Sun C, Liu M (2019) Sparse two-dimensional discriminant locality-preserving projection (S2DDLPP) for feature extraction
Wang P, Li W, Gao Z, Tang C, Ogunbona PO (2018) Depth pooling based Large-Scale 3-D action recognition with convolutional neural networks. IEEE Trans Multimedia 20(5):1051–1061. https://doi.org/10.1109/TMM.2018.2818329
Article Google Scholar
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. IEEE Conf Comput Vis Pattern Recogn:1290–1297
Wang H, Schmid C (2013) Action recognition with improved trajectories. IEEE Int Conf Comput Vis:3551–3558
Wang C, Wang Y, Yuille A L (2013) An Approach to Pose-Based Action Recognition. IEEE Conference on Computer Vision and Pattern Recognition, Portland, pp 915–922
Wei P, Sun H, Zheng N (2018) Learning composite latent structures for 3D human action representation and recognition. IEEE Trans Multimed 21:2195–2208. https://doi.org/10.1109/TMM.2019.2897902
Article Google Scholar
Wiliem A, Madasu V, Boles W, Yarlagadda P (2010) An Update-Describe approach for human action recognition in surveillance video. Int Conf Digit Image Comput Techn Appl:270–275
Xia L, Aggarwal J K (2013) Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera. IEEE Conf Comput Vis Pattern Recogn:2834–2841
Xia L, Chen C, Aggarwal J K (2012) View invariant human action recognition using histograms of 3D joints. IEEE Comput Soc Conf Comput Vis Pattern Recogn Worksh:20–27
Xiao Y, Chen J, Wang YC, Cao ZG, Zhou JT, Bai X (2019) Action recognition for depth video using multi-view dynamic images. Inf Sci 480:287–304. https://doi.org/10.1016/j.ins.2018.12.050
Article Google Scholar
Yang X. (2017) Super normal vector for human activity recognition with depth cameras. IEEE Trans Pattern Anal Mach Intell 39(5):1028–1039
Article Google Scholar
Yang T, Hou Z, Liang J, Gu Y, Chao X (2020) Depth Sequential Information Entropy Maps and Multi-Label Subspace Learning for Human Action Recognition. In: IEEE Access, vol 8, pp 135118–135130. https://doi.org/10.1109/ACCESS.2020.3006067
Yang R, Yang R (2014) DMM-Pyramid based deep architectures for action recognition with depth cameras. Asian Conf Comput Vis:37–49
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. ACM Multimed:1057–1060. https://doi.org/10.1145/2393347.2396382
Yao G L, Lei T, Zhong J D, Jiang P (2019) Learning multi-temporal-scale deep information for action recognition. Appl Intell 49:2017–2029. https://doi.org/10.1007/s10489-018-1347-3
Article Google Scholar
Zeeshan A, Kandasamy I, Naimul K, Dimitri A (2019) Human action recognition using convolutional neural network and depth sensor data. Int Conf Inf Technol Comput Commun:1–5
Zhang B, Yang Y, Chen C, Yang L, Han J, Shao L (2017) Action recognition using 3D histograms of texture and a Multi-Class boosting classifier. IEEE Trans Image Process 26(10):4648–4660. https://doi.org/10.1109/tip.2017.2718189
Article MathSciNet Google Scholar
Zhu Q Y, Siew C K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. IEEE Int Joint Conf Neural Netw 2:985–990. https://doi.org/10.1109/IJCNN.2004.1380068
Google Scholar

Download references

Acknowledgements

This work is partly supported by the National Key Research and Development Program of China under grant no. 2018YFC0407905, and the fundamental research funds of China for central universities under grant no. B200202188.

Author information

Authors and Affiliations

School of Computer and Information, Hohai University, Nanjing, China
Chang Li, Qian Huang, Xing Li & Qianhan Wu

Authors

Chang Li
View author publications
You can also search for this author in PubMed Google Scholar
Qian Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xing Li
View author publications
You can also search for this author in PubMed Google Scholar
Qianhan Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qian Huang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Huang, Q., Li, X. et al. Human action recognition based on multi-scale feature maps from depth video sequences. Multimed Tools Appl 80, 32111–32130 (2021). https://doi.org/10.1007/s11042-021-11193-4

Download citation

Received: 14 October 2020
Revised: 22 April 2021
Accepted: 24 June 2021
Published: 24 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11042-021-11193-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Human action recognition based on multi-scale feature maps from depth video sequences

Abstract

Access this article

Similar content being viewed by others

Real-Time Human Action Recognition Using DMMs-Based LBP and EOH Features

Gradient local auto-correlation features for depth human action recognition

Human action recognition on depth dataset

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human action recognition based on multi-scale feature maps from depth video sequences

Abstract

Access this article

Similar content being viewed by others

Real-Time Human Action Recognition Using DMMs-Based LBP and EOH Features

Gradient local auto-correlation features for depth human action recognition

Human action recognition on depth dataset

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation