Second-order motion descriptors for efficient action recognition

Oves García, Reinier; Morales, Eduardo F.; Sucar, L. Enrique

doi:10.1007/s10044-020-00924-2

Second-order motion descriptors for efficient action recognition

Original Article
Published: 28 October 2020

Volume 24, pages 473–482, (2021)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Reinier Oves García¹,
Eduardo F. Morales¹ &
L. Enrique Sucar¹

252 Accesses
5 Citations
Explore all metrics

Abstract

Human action recognition from realistic video data constitutes a challenging and relevant research area. Leading the state of the art we can find those methods based on convolutional neural networks (CNNs), and specially two-stream CNNs. In this family of deep architectures, the appearance channel learns from the RGB images and the motion channel learns from a motion representation, usually, the optical flow. Given that action recognition requires the extraction of complex motion patterns descriptors in image sequences, we introduce a new set of second-order motion representations capable of capturing both: geometrical and kinematic properties of the motion (curl, div, curvature, and acceleration). Besides, we present a new and effective strategy capable of reducing training times without sacrificing the performance when using the I3D two-stream CNN and robust to the weakness of a single channel. The experiments presented in this paper were carried out over two of the most challenging datasets for action recognition: UCF101 and HMDB51. Reported results show an improvement in accuracy over the UCF101 dataset where an accuracy of 98.45% is achieved when the curvature and acceleration are combined as a motion representation. For the HMDB51, our approach shows a competitive performance, achieving an accuracy of 80.19%. In both datasets, our approach shows a considerable reduction in time for the preprocessing and training phases. Preprocessing time is reduced to a sixth of the time while the training procedure for the motion stream can be performed in a third of the time usually employed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Scheme for Training Two-Stream CNNs for Action Recognition

Human action recognition using a hybrid deep learning heuristic

Article 28 August 2021

Samarendra Chandan Bindu Dash, Soumya Ranjan Mishra, … L. V. Narasimha Prasad

Learning representative temporal features for action recognition

Article 17 May 2021

Ali Javidani & Ahmad Mahmoudi-Aznaveh

Notes

We could not train the second-order descriptors on the Kinetics dataset where the optical flow was originally trained for lack of computational resources.

References

García RO, Morales EF, Novel Sucar LE A (2019) Scheme for training two-stream CNNs for action recognition. In: Iberoamerican congress on pattern recognition. Springer, Cham, pp 729–739
Ahad MAR, Tan JK, Kim H, Ishikawa S (2012) Motion history image: its variants and applications. Mach Vis Appl 23(2):255–281
Article Google Scholar
Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: a survey. Image Vis Comput 1(60):4–21
Article Google Scholar
Nanni L, Stefano G, Sheryl B (2017) Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recogn 71:158–172
Article Google Scholar
Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: Pose motion representation for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7024–7033
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308
Diba A, Fayyaz M, Sharma V, Karami AH, Arzani MM, Yousefzadeh R, Van Gool L (2017) Temporal 3d convnets: new architecture and transfer learning for video classification. arXiv preprint arXiv:1711.08200
Zach C, Pock T, Bischof HA (2007) Duality based approach for realtime TV-L 1 optical flow. In: Joint pattern recognition symposium. Springer, Berlin, pp 214–223
García RO, Valentin L, Risquet CP, Sucar LE (2017) A pathline-based background subtraction algorithm. In: 9th Mexican conference on pattern recognition, pp 179–188
Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: 2011 international conference on computer vision. IEEE, pp 2556–2563
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, Cham, pp 630–645
Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–31
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Diba A, Pazandeh AM, Van Gool L (2016) Efficient two-stream motion and appearance 3d CNNs for video classification. arXiv preprint arXiv:1608.08851
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Tran D, Ray J, Shou Z, Chang SF, Paluri M (2017) Convnet architecture search for spatiotemporal feature learning. arXiv preprint arXiv:1708.05038
Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4694–4702
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Wang Y, Song J, Wang L, Van Gool L, Hilliges O (2016) Two-stream SR-CNNs for action recognition in videos. In: BMVC
Peng X, Schmid C (2016) Multi-region two-stream R-CNN for action detection. In: European conference on computer vision. Springer, Cham, pp 744–759
Saha S, Singh G, Sapienza M, Torr PH, Cuzzolin F (2016) Deep learning for detecting multiple space–time action tubes in videos. arXiv preprint arXiv:1608.01529
Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d CNNs retrace the history of 2d CNNs and imagenet? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6546–6555
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lSTM with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, Cham, pp 816–833
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding, pp 29–39. Springer, Berlin
Ma S, Sigal L, Sclaroff S (2016) Learning activity progression in lSTMS for activity detection and early detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1942–1950
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Carreira J, Noland E, Hillier C, Zisserman A (2019) A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987
Szegedy C, Vanhoucke V, I47offe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79
Article MathSciNet Google Scholar
Rao C, Shah M (2001) View-invariance in action recognition. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. In: CVPR 2001. IEEE, vol 2, pp II-II
Bashir FI, Khokhar AA, Schonfeld D (2006) View-invariant motion trajectory-based activity classification and recognition. Multimed Syst 12(1):45–54
Article Google Scholar
Chen H, Chirikjian GS (2019) Curvature: a signature for Action Recognition in Video Sequences. arXiv preprint arXiv:1904.13003
Weinkauf T, Theisel H (2002) Curvature measures of 3D vector fields and their applications. J WSCG, pp 507–514
Jain M, Jegou H, Bouthemy P (2013) Better exploiting motion for better action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2555–2562
Wang X, Qi C (2016) Action recognition using edge trajectories and motion acceleration descriptor. Mach Vis Appl 27(6):861–75
Article Google Scholar
Canny JF (1983) Finding edges and lines in images. Massachusetts inst of Tech Cambridge Artificial Intelligence Lab
Kroeger T, Timofte R, Dai D, Van Gool L (2016). Fast optical flow using dense inverse search. In: European conference on computer vision. Springer, Cham, pp 471–488
Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Scandinavian conference on image analysis. Springer, Berlin, pp 363–370
Theisel H (1998) Visualizing the curvature of unsteady 2D flow fields. In: Proceedings of the 9th EG workshop on visualization in science computing, pp 47–56
Suter D (1994) Motion estimation and vector splines. In: CVPR, Vol 94, pp 939–942
Vetterling WT, Teukolsky SA, Press WH, Flannery BP (1989) Numerical recipes. University Press, Cambridge
MATH Google Scholar
Cruz C, Sucar LE, Morales EF (2008) Real-time face recognition for human–robot interaction. In: 2008 8th IEEE international conference on automatic face and gesture recognition. IEEE, pp 1–6
Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–81
Article Google Scholar
Yao Y, Rosasco L, Caponnetto A (2007) On early stopping in gradient descent learning. Construct Approx 26(2):289–315
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Nacional de Astrofísica Óptica y Electrónica (INAOE), Luis Enrique Erro # 1, Tonantzintla, Puebla, C.P. 72840, Mexico
Reinier Oves García, Eduardo F. Morales & L. Enrique Sucar

Authors

Reinier Oves García
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo F. Morales
View author publications
You can also search for this author in PubMed Google Scholar
L. Enrique Sucar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reinier Oves García.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oves García, R., Morales, E.F. & Sucar, L.E. Second-order motion descriptors for efficient action recognition. Pattern Anal Applic 24, 473–482 (2021). https://doi.org/10.1007/s10044-020-00924-2

Download citation

Received: 07 February 2020
Accepted: 14 October 2020
Published: 28 October 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s10044-020-00924-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Second-order motion descriptors for efficient action recognition

Abstract

Access this article

Similar content being viewed by others

A Novel Scheme for Training Two-Stream CNNs for Action Recognition

Human action recognition using a hybrid deep learning heuristic

Learning representative temporal features for action recognition

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Second-order motion descriptors for efficient action recognition

Abstract

Access this article

Similar content being viewed by others

A Novel Scheme for Training Two-Stream CNNs for Action Recognition

Human action recognition using a hybrid deep learning heuristic

Learning representative temporal features for action recognition

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation