Human action interpretation using convolutional neural network: a survey

Malik, Zainab; Shapiai, Mohd Ibrahim Bin

doi:10.1007/s00138-022-01291-0

Human action interpretation using convolutional neural network: a survey

Original Paper
Published: 19 March 2022

Volume 33, article number 37, (2022)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

686 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Human action interpretation (HAI) is one of the trending domains in the era of computer vision. It can further be divided into human action recognition (HAR) and human action detection (HAD). The HAR analyzes frames and provides label(s) to overall video, whereas the HAD localizes actor first, in each frame, and then estimates the action score for the detected region. The effectiveness of a HAI model is highly dependent on the representation of spatiotemporal features and the model’s architectural design. For the effective representation of these features, various studies have been carried out. Moreover, to better learn these features and to get the action score on the basis of these features, different designs of deep architectures have also been proposed. Among various deep architectures, convolutional neural network (CNN) is relatively more explored for HAI due to its lesser computational cost. To provide overview of these efforts, various surveys have been published to date; however, none of these surveys is focusing the features’ representation and design of proposed architectures in detail. Secondly, none of these studies is focusing the pose assisted HAI techniques. This study provides a more detailed survey on existing CNN-based HAI techniques by incorporating the frame level as well as pose level spatiotemporal features-based techniques. Besides these, it offers comparative study on different publicly available datasets used to evaluate HAI models based on various spatiotemporal features’ representations. Furthermore, it also discusses the limitations and challenges of the HAI and concludes that human action interpretation from visual data is still very far from the actual interpretation of human action in realistic videos which are continuous in nature and may contain multiple human beings performing multiple actions sequentially or in parallel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Action Recognition Based on Temporal Pose CNN and Multi-dimensional Fusion

Fusion of spatial and dynamic CNN streams for action recognition

Article 23 March 2021

Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras

References

Ahad, M.A.R., Tan, J.K., Kim, H., Ishikawa, S.: Motion history image: its variants and applications. Mach. Vis. Appl. 23(2), 255–281 (2012)
Article Google Scholar
Asghari-Esfeden, S., Sznaier, M., Camps, O.: Dynamic motion representation for human action recognition. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 557–566 (2020)
Baştan, M., Cam, H., Güdükbay, U., Ulusoy, O.: Bilvideo-7: an mpeg-7-compatible video indexing and retrieval system. IEEE Multimed. 17(3), 62–73 (2010)
Article Google Scholar
Beddiar, D.R., Nini, B., Sabokrou, M., Hadid, A.: Vision-based human activity recognition: a survey. Multimed. Tools Appl. 79(41), 30509–30555 (2020)
Article Google Scholar
Bouguet, J.-Y.: Pyramidal implementation of the affine Lucas Kanade feature tracker description of the algorithm. Intel Corporation 5(1–10), 4 (2001)
Google Scholar
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Proceedings of the European Conference on Computer Vision, pp. 25–36. Springer (2004)
Cao, Z., Simon, T., Wei, S.-E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Chao, Y.-W., Liu, Y., Liu, X., Zeng, H., Deng, J.: Learning to detect human–object interactions. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 381–389. IEEE (2018)
Chaudhary, S., Dudhane, A., Patil, P., Murala, S.: Pose guided dynamic image network for human action recognition in person centric videos. In: Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–8. IEEE (2019)
Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 168–172 (2015)
Chéron, G., Laptev, I., Schmid, C.: P-CNN: pose-based CNN features for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3218–3226 (2015)
Choutas, V., Weinzaepfel, P., Revaud, J., Schmid, C.: Potion: pose motion representation for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7024–7033 (2018)
Cornacchia, M., Ozcan, K., Zheng, Y., Velipasalar, S.: A survey on activity detection and classification using wearable sensors. IEEE Sens. J. 17(2), 386–403 (2016)
Article Google Scholar
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: an overview. IEEE Signal Process. Mag. 35(1), 53–65 (2018)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893 (2005)
Danelljan, M., Khan, F.S., Felsberg, M., van de Weijer, J.: Adaptive color attributes for real-time visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090–1097 (2014)
Dang, Q., Yin, J., Wang, B., Zheng, W.: Deep learning based 2D human pose estimation: a survey. Tsinghua Sci. Technol. 24(6), 663–676 (2019)
Article Google Scholar
Dedeoğlu, Y., Töreyin, B.U., Güdükbay, U., Çetin, A.E.: Silhouette-based method for object classification and human action recognition in video. In: Huang, T.S., Sebe, N., Lew, M.S., Pavlović, V., Kölsch, M., Galata, A., Kisačanin, B. (eds.) Proceedings of Workshop on Human Computer Interaction (HCI/ECCV 2006), vol. 3979, pp. 64–77. Springer, Berlin (2006)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Doulamis, A., Voulodimos, A., Varvarigou, T.: Human face region detection driving activity recognition in video. In: Computer Vision: Concepts, Methodologies, Tools, and Applications, pp. 2102–2123. IGI Global, Hershey, PA, USA (2018)
Fan, J., Shen, X., Wu, Y.: Scribble tracker: a matting-based approach for robust tracking. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1633–1644 (2012)
Article Google Scholar
Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. CoRR arXiv:abs/1504.06852 (2015)
Gao, Z., Cheong, L.-F., Wang, Y.-X.: Block-sparse RPCA for salient motion detection. IEEE Trans. Pattern Anal. Mach. Intell. 36(10), 1975–1987 (2014)
Article Google Scholar
Gavrilyuk, K., Ghodrati, A., Zhenyang, L., Snoek, C.G.M.: Spatio-temporal action and actor localization. US Patent 10,896,342. Google Patents (2021)
Gidaris, S., Komodakis, N.: Object detection via a multi-region & semantic segmentation-aware CNN model. CoRR arXiv:abs/1505.01749 (2015)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human–object interactions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8359–8367. IEEE (2018)
Golestani, N., Moghaddam, M.: Human activity recognition using magnetic induction-based motion signals and deep recurrent neural networks. Nat. Commun. 11(1), 1–11 (2020)
Google Scholar
Gupta, S., Malik, J.: Visual semantic role labeling. arXiv preprint arXiv:1505.04474 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, W., Yamashita, T., Lu, H., Lao, S.: Surf tracking. In: Proceedings of the IEEE 12th International Conference on Computer Vision, pp. 1586–1592. IEEE (2009)
Hou, R., Chen, C., Shah, M.: Tube convolutional neural network (T-CNN) for action detection in videos. CoRR arXiv:abs/1703.10664 (2017)
Hua, A., Quicksall, Z., Di, C., Motl, R., LaCroix, A.Z., Schatz, B., Buchner, D.M.: Accelerometer-based predictive models of fall risk in older women: a pilot study. NPJ Digit. Med. 1(1), 1–8 (2018)
Article Google Scholar
Jegham, I., Khalifa, A.B., Alouani, I., Mahjoub, M.A.: Vision-based human action recognition: an overview and real world challenges. Forensic Sci. Int. Digit. Investig. 32, 200901 (2020)
Article Google Scholar
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3304–3311 (2010)
Jhuang, H., Gall, J., Zuffi, S., Schmid, C., Black, M.J.: Towards understanding action recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3192–3199 (2013)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012)
Article Google Scholar
Jin, C.-B., Li, S., Do, T.D., Kim, H.: Real-time human action recognition using CNN over temporal images for static video surveillance cameras. In: Ho, Y.-S., Sang, J., Ro, Y.M., Kim, J., Wu, F. (eds.) Advances in Multimedia Information Processing—PCM 2015, pp. 330–339. Springer, Cham (2015)
Chapter Google Scholar
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2011)
Article Google Scholar
Ke, Q., An, S., Bennamoun, M., Sohel, F., Boussaid, F.: Skeletonnet: mining deep part features for 3-D action recognition. IEEE Signal Process. Lett. 24(6), 731–735 (2017)
Article Google Scholar
Kingma, D.P., Welling, M.: An introduction to variational autoencoders. arXiv preprint arXiv:1906.02691 (2019)
Ko, K.-E., Sim, K.-B.: Deep convolutional framework for abnormal behavior detection in a smart surveillance system. Eng. Appl. Artif. Intell. 67, 226–234 (2018)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: A large video database for human motion recognition. In: Proceedings of the International Conference on Computer Vision, pp. 2556–2563 (2011)
Lai, Y.-H., Yang, C.-K.: Video object retrieval by trajectory and appearance. IEEE Trans. Circuits Syst. Video Technol. 25(6), 1026–1037 (2014)
Google Scholar
Laptev, I., Caputo, B. et al.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, pp. 32–36. IEEE (2004)
Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: Advances in Neural Information Processing Systems, pp. 801–808 (2007)
Li, C., Tong, R., Tang, M.: Modelling human body pose for action recognition using deep neural networks. Arabian J. Sci. Eng. 43(12), 7777–7788 (2018)
Article Google Scholar
Lietz, H., Ritter, M., Manthey, R., Wanielik, G.: Improving pedestrian detection using mpeg-7 descriptors. Adv. Radio Sci. 11(C.4), 101–105 (2013)
Article Google Scholar
Lin, L., Liu, B., Xiao, Y.: An object tracking method based on CNN and optical flow. In: Proceedings of the 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pp. 24–31 (2017)
Liu, C., Ying, J., Yang, H., Hu, X., Liu, J.: Improved human action recognition approach based on two-stream convolutional neural network model. Visual Comput. 37, 1327–1341 (2021)
Article Google Scholar
Liu, J., Shahroudy, A., Perez, M., Wang, G., Duan, L.-Y., Kot, A.C.: NTU RGB+D 120: a large-scale benchmark for 3d human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 2684–2701 (2020)
Article Google Scholar
Liu, M., Meng, F., Chen, C., Wu, S.: Joint dynamic pose image and space time reversal for human action recognition from videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8762–8769 (2019)
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)
Article Google Scholar
Lu, Y., Wei, Y., Liu, L., Zhong, J., Sun, L., Liu, Y.: Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed. Tools Appl. 76(8), 10701–10719 (2017)
Article Google Scholar
Ludl, D., Gulde, T., Curio, C.: Simple yet efficient real-time pose-based action recognition. In: Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC), pp. 581–588. IEEE (2019)
Ma, M., Marturi, N., Li, Y., Leonardis, A., Stolkin, R.: Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos. Pattern Recogn. 76, 506–521 (2018)
Article Google Scholar
Muralikrishna, S., Muniyal, B., Acharya, U.D., Holla, R.: Enhanced human action recognition using fusion of skeletal joint dynamics and structural features. J. Robot. 2020, 3096858 (2020)
Google Scholar
Ng, J.Y., Choi, J., Neumann, J., Davis, L.S.: Actionflownet: learning motion representation for action recognition. CoRR arXiv:1612.03052 (2016)
Nweke, H.F., Teh, Y.W., Al-Garadi, M.A., Alo, U.R.: Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: state of the art and research challenges. Expert Syst. Appl. 105, 233–261 (2018)
Article Google Scholar
Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005)
Article Google Scholar
Peng, X., Schmid, C.: Multi-region two-stream R-CNN for action detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, pp. 744–759. Springer, Cham (2016)
Chapter Google Scholar
Pérez, J.S., Meinhardt-Llopis, E., Facciolo, G.: Tv-l1 optical flow estimation. Image Process. On Line 2013, 137–150 (2013)
Article Google Scholar
Perš, J., Sulić, V., Kristan, M., Perše, M., Polanec, K., Kovačič, S.: Histograms of optical flow for efficient representation of body motion. Pattern Recogn. Lett. 31(11), 1369–1376 (2010). https://doi.org/10.1016/j.patrec.2010.03.024
Article Google Scholar
Pham, H.-H., Khoudour, L., Crouzil, A., Zegers, P., Velastin, S.A.: Skeletal movement to color map: a novel representation for 3D action recognition with inception residual networks. In: Proceedings of the 25th IEEE international conference on image processing (ICIP), pp. 3483–3487. IEEE (2018)
Ponti, M.A., Ribeiro, L.S.F., Nazare, T.S., Bui, T., Collomosse, J.; Everything you wanted to know about deep learning for computer vision but were afraid to ask. In: Proceedings of the 30th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), pp. 17–41. IEEE (2017)
Ranasinghe, S., Al Machot, F., Mayr, H.C.: A review on applications of activity recognition systems with regard to performance and evaluation. Int. J. Distrib. Sensor Netw. 12(8), 1–22 (2016)
Article Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR arXiv:1506.01497 (2015)
Rodriguez, M.D., Ahmed, J., Shah, M.: Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1194–1201. IEEE (2012)
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1593–1600. IEEE (2009)
Sargano, A.B., Angelov, P., Habib, Z.: A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition. Appl. Sci. 7(1), 110 (2017)
Article Google Scholar
Saykol, E., Bastan, M., Güdükbay, U., Ulusoy, Ö.: Keyframe labeling technique for surveillance event classification. Opt. Eng. 49(11), 117203 (2010)
Article Google Scholar
Shahroudy, A., Liu, J., Ng, T.-T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Shao, L., Ji, L.: Motion histogram analysis based key frame extraction for human action/activity representation. In: Proceedings of the Canadian Conference on Computer and Robot Vision, pp. 88–92. IEEE (2009)
Sigurdsson, G.A., Divvala, S., Farhadi, A., Gupta, A.: Asynchronous temporal fields for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 585–594 (2017)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. CoRR arXiv:abs/1406.2199 (2014)
Singh, G., Saha, S., Sapienza, M., Torr, P., Cuzzolin, F.: Online real-time multiple spatiotemporal action localisation and prediction. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3657–3666 (2018)
Soomro, K., Zamir, A.R., Shah, M., Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. CoRR arXiv:1212.0402 (2012)
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 843–852. PMLR (2015)
Sun, L., Jia, K., Yeung, D.-Y., Shi, B.E.: Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4597–4605 (2015)
Sun, S.: Multi-view Laplacian support vector machines. In: Proceedings of the International Conference on Advanced Data Mining and Applications, pp. 209–222. Springer (2011)
Tarwani, K.M., Edem, S.: Survey on recurrent neural network in natural language processing. Int. J. Eng. Trends Technol 48, 301–304 (2017)
Article Google Scholar
Tu, Z., Xie, W., Qin, Q., Poppe, R., Veltkamp, R.C., Li, B., Yuan, J.: Multi-stream CNN: Learning representations based on human-related regions for action recognition. Pattern Recogn. 79, 32–43 (2018)
Article Google Scholar
Varol, G., Laptev, I., Schmid, C.: Long-term temporal convolutions for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1510–1517 (2018)
Article Google Scholar
Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018, 7068349 (2018)
Article Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1290–1297. IEEE (2012)
Wang, J., Cherian, A., Porikli, F.: Ordered pooling of optical flow sequences for action recognition. CoRR arXiv:abs/1701.03246 (2017)
Wang, L., Ge, L., Li, R., Fang, Y.: Three-stream CNNs for action recognition. Pattern Recogn. Lett. 92, 33–40 (2017)
Article Google Scholar
Wang, Y., Song, J., Wang, L., Van Gool, L., Hilliges, O.: Two-stream SR-CNNs for action recognition in videos. In: Richard, E.R.H., Wilson, C., Smith, W.A.P. (eds.) Proceedings of the British machine vision conference (BMVC), p. 12. BMVA Press, Durham, UK (2016)
Warchoł, D., Kapuściński, T.: Human action recognition using bone pair descriptor and distance descriptor. Symmetry 12(10), 1580 (2020)
Article Google Scholar
Widodo, A., Yang, B.-S.: Support vector machine in machine condition monitoring and fault diagnosis. Mech. Syst. Signal Process. 21(6), 2560–2574 (2007)
Article Google Scholar
Wixson, L.: Detecting salient motion by accumulating directionally-consistent flow. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 774–780 (2000)
Article Google Scholar
Wu, Q., Liu, Y., Li, Q., Jin, S., Li, F.: The application of deep learning in computer vision. In: 2017 Chinese Automation Congress (CAC), pp. 6522–6527. IEEE (2017)
Yan, A., Wang, Y., Li, Z., Qiao, Y.: PA3D: Pose-action 3D machine for video recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7922–7931 (2019)
Yao, G., Lei, T., Zhong, J.: A review of convolutional-neural-network-based action recognition. Pattern Recogn. Lett. 118, 14–22 (2019)
Article Google Scholar
Yin, J., Yang, Q., Pan, J.J.: Sensor-based abnormal human-activity detection. IEEE Trans. Knowl. Data Eng. 20(8), 1082–1090 (2008)
Article Google Scholar
Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 28–35 (2012)
Zhang, D., Guo, G., Huang, D., Han, J.: PoseFlow: a deep motion representation for understanding human behaviors in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6762–6770 (2018)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1963–1978 (2019)
Article Google Scholar
Zhang, W., Zhu, M., Derpanis, K.G.: From actemes to action: a strongly-supervised representation for detailed action understanding. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2248–2255 (2013)
Zhao, J., Mao, X., Zhang, J.: Learning deep facial expression features from image and optical flow sequences using 3D CNN. Visual Comput. 34(10), 1461–1475 (2018)
Article Google Scholar
Zhao, Z., Elgammal, A.M.: Information theoretic key frame selection for action recognition. In: BMVC, pp. 1–10 (2008)
Zhou, Z.-H., Sun, Y.-Y., Li, Y.-F.: Multi-instance learning by treating instances as non-IID samples. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1249–1256 (2009)
Zhu, J., Zou, W., Xu, L., Hu, Y., Zhu, Z., Chang, M., Huang, J., Huang, G., Du, D.: Action machine: rethinking action recognition in trimmed videos. arXiv preprint arXiv:1812.05770 (2018)
Zhu, J., Zou, W., Zhu, Z., Xu, L., Huang, G.: Action machine: toward person-centric action recognition in videos. IEEE Signal Process. Lett. 26(11), 1633–1637 (2019)
Article Google Scholar
Zolfaghari, M., Oliveira, G.L., Sedaghat, N., Brox, T.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2923–2932. IEEE (2017)

Download references

Author information

Authors and Affiliations

Centre for Artificial Intelligence and Robotics iKohza, Malaysia-Japan International Institute of Technology, Universiti Teknologi Malaysia, Kuala Lumpur, 54100, Malaysia
Zainab Malik & Mohd Ibrahim Bin Shapiai
Department of Computer Science, National University of Modern Languages, Islamabad, Pakistan
Zainab Malik

Authors

Zainab Malik
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Ibrahim Bin Shapiai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zainab Malik.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malik, Z., Shapiai, M.I.B. Human action interpretation using convolutional neural network: a survey. Machine Vision and Applications 33, 37 (2022). https://doi.org/10.1007/s00138-022-01291-0

Download citation

Received: 21 October 2020
Revised: 26 November 2021
Accepted: 18 February 2022
Published: 19 March 2022
DOI: https://doi.org/10.1007/s00138-022-01291-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human action interpretation using convolutional neural network: a survey

Abstract

Access this article

Similar content being viewed by others

Human Action Recognition Based on Temporal Pose CNN and Multi-dimensional Fusion

Fusion of spatial and dynamic CNN streams for action recognition

Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human action interpretation using convolutional neural network: a survey

Abstract

Access this article

Similar content being viewed by others

Human Action Recognition Based on Temporal Pose CNN and Multi-dimensional Fusion

Fusion of spatial and dynamic CNN streams for action recognition

Real-Time Human Action Recognition Using CNN Over Temporal Images for Static Video Surveillance Cameras

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation