Discriminative Dictionary Design for Action Classification in Still Images and Videos

Roy, Abhinaba; Banerjee, Biplab; Hussain, Amir; Poria, Soujanya

doi:10.1007/s12559-021-09851-8

Discriminative Dictionary Design for Action Classification in Still Images and Videos

Published: 03 March 2021

Volume 13, pages 698–708, (2021)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Abhinaba Roy¹,
Biplab Banerjee²,
Amir Hussain³ &
…
Soujanya Poria ORCID: orcid.org/0000-0003-3167-2208⁴

186 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, we address the problem of action recognition from still images and videos. Traditional local features such as SIFT and STIP invariably pose two potential problems: 1) they are not evenly distributed in different entities of a given category and 2) many of such features are not exclusive of the visual concept the entities represent. In order to generate a dictionary taking the aforementioned issues into account, we propose a novel discriminative method for identifying robust and category specific local features which maximize the class separability to a greater extent. Specifically, we pose the selection of potent local descriptors as filtering-based feature selection problem, which ranks the local features per category based on a novel measure of distinctiveness. The underlying visual entities are subsequently represented based on the learned dictionary, and this stage is followed by action classification using the random forest model followed by label propagation refinement. The framework is validated on the action recognition datasets based on still images (Stanford-40) as well as videos (UCF-50). We get 51.2% and 66.7% recognition accuracy for Standford-40 and UCF-50, respectively. Compared to other representative methods from the literature, our approach exhibits superior performances. This proves the effectiveness of adaptive ranking methodology presented in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Olga Russakovsky, Jia Deng, … Li Fei-Fei

Image Matching from Handcrafted to Deep Features: A Survey

Article Open access 04 August 2020

Jiayi Ma, Xingyu Jiang, … Junchi Yan

References

Alexe B, Deselaers T, Ferrari V. Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell. 2012;34(11):2189–202.
Article Google Scholar
Comaniciu D, Meer P. Mean shift: A robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell. 2002;24(5):603–19.
Article Google Scholar
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y. Locality-constrained linear coding for image classification. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010. pp. 3360–3367.
Perronnin F, Sánchez J, Mensink T. Improving the fisher kernel for large-scale image classification. In European conference on computer vision. Springer, 2010. pp. 143–156.
Cheng G, Wan Y, Saudagar AN, Namuduri K, Buckles BP. Advances in human action recognition: a survey. arXiv preprint 2015. arXiv:1501.05964
Yang W, Wang Y, Mori G. Recognizing human actions from still images with latent poses. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010. pp. 2030–2037.
Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis. 2004;60(2):91–110.
Article Google Scholar
Chatfield K, Lempitsky VS, Vedaldi A, Zisserman A. The devil is in the details: an evaluation of recent feature encoding methods. In BMVC. 2011. vol. 2, p. 8.
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell. 2010;32(9):1627–45.
Article Google Scholar
Wang Y, Tran D, Liao Z, Forsyth D. Discriminative hierarchical part-based models for human parsing and action recognition. J Mach Learn Res. 2012;13:3075–102.
MathSciNet MATH Google Scholar
Yao B, Jiang X, Khosla A, Lin AL, Guibas L, Fei-Fei L. Human action recognition by learning bases of action attributes and parts. In 2011 International Conference on Computer Vision. IEEE, 2011. pp. 1331–1338.
Sharma G, Jurie F, Schmid C. Expanded parts model for human attribute and action recognition in still images. In Proc IEEE Conf Comput Vis Pattern Recognit. 2013. pp. 652–659.
Zhang L, Li C, Peng P, Xiang X, Song J. Towards optimal vlad for human action recognition from still images. Image and Vision Computing. 2016;.
Juneja M, Vedaldi A, Jawahar C, Zisserman A. Blocks that shout: Distinctive parts for scene classification. In Proc IEEE Conf Comput Vis Pattern Recognit. 2013. pp. 923–930.
Sicre R, Jurie F. Discriminative part model for visual recognition. Comput Vis Image Underst. 2015;141:28–37.
Article Google Scholar
Zhou Y, Ni B, Hong R, Wang M, Tian Q. Interaction part mining: A mid-level approach for fine-grained action recognition. In Proc IEEE Conf Comput Vis Pattern Recognit. 2015. pp. 3323–3331.
Laptev I. On space-time interest points. Int J Comput Vis. 2005;64(2–3):107–23.
Article Google Scholar
Chakraborty B, Holte MB, Moeslund TB, Gonzalez J, Roca FX. A selective spatio-temporal interest point detector for human action recognition in complex scenes. In Computer Vision (iccv), 2011 Ieee International Conference on. IEEE, 2011. pp. 1776–1783.
Wang H, Schmid C. Action recognition with improved trajectories. In Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013. pp. 3551–3558.
Xiong W, Lee JC-M. Efficient scene change detection and camera motion annotation for video classification. Comput Vis Image Underst. 1998;71(2):166–81.
Article Google Scholar
Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In Adv Neural Inf Proces Syst. 2014. pp. 568–576.
Shukla P, Biswas KK, Kalra PK. Action recognition using temporal bag-of-words from depth maps. In MVA. 2013. pp. 41–44.
Bettadapura V, Schindler G, Plötz T, Essa I. Augmenting bag-of-words: Data-driven discovery of temporal and structural information for activity recognition. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 2619–2626. IEEE, 2013.
Qiu Q, Jiang Z, Chellappa R. Sparse dictionary-based representation and recognition of action attributes. In Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. pp. 707–714.
Liu W, Wang Z, Tao D, Yu J. Hessian regularized sparse coding for human action recognition. In International Conference on Multimedia Modeling. Springer, 2015. pp. 502–511.
Lu C, Shi J, Jia J. Abnormal event detection at 150 fps in matlab. In Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013. pp. 2720–2727.
Fanello SR, Gori I, Metta G, Odone F. Keep it simple and sparse: Real-time action recognition. J Mach Learn Res. 2013;14(1):2617–40.
Google Scholar
Xu K, Jiang X, Sun T. Two-stream dictionary learning architecture for action recognition. IEEE Trans Circuits Syst Video Technol. 2017;27(3):567–76.
Article Google Scholar
Chatfield K, Simonyan K, Vedaldi A, Zisserman A. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint 2014. arXiv:1405.3531
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In Adv Neural Inf Proces Syst. 2012. pp. 1097–1105.
Roy A, Banerjee B, Murino V. Discriminative dictionary design for action classification in still images. In International Conference on Image Analysis and Processing. Springer, 2017. pp. 160–170.
Pavan M, Pelillo M. Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell. 2007;29(1):167–72.
Article Google Scholar
Bishop CM. Pattern recognition. Mach Learn. 2006;128:
Zhu X, Ghahramani Z. Learning from labeled and unlabeled data with label propagation. 2002.
Reddy KK, Shah M. Recognizing 50 human action categories of web videos. Mach Vis Appl. 2013;24(5):971–81.
Article Google Scholar
Li LJ, Su H, Fei-Fei L, Xing EP. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In Adv Neural Inf Proces Syst. 2010. pp. 1378–1386.
Jiang Z, Lin Z, Davis LS. Learning a discriminative dictionary for sparse coding via label consistent k-svd. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. pp. 1697–1704.
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). IEEE, 2006. vol. 2, pp. 2169–2178.
Oliva A, Torralba A. Building the gist of a scene: The role of global image features in recognition. Prog Brain Res. 2006;155:23–36.
Article Google Scholar
Sadanand S, Corso JJ. Action bank: A high-level representation of activity in video. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. pp. 1234–1241.

Download references

Author information

Authors and Affiliations

Nanyang Technological University, Singapore, Singapore
Abhinaba Roy
Indian Institute of Technology, Bombay, India
Biplab Banerjee
Edinburgh Napier University, Edinburgh, UK
Amir Hussain
Singapore University of Technology and Design, Singapore, Singapore
Soujanya Poria

Authors

Abhinaba Roy
View author publications
You can also search for this author in PubMed Google Scholar
Biplab Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Soujanya Poria
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soujanya Poria.

Ethics declarations

Conflicts of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roy, A., Banerjee, B., Hussain, A. et al. Discriminative Dictionary Design for Action Classification in Still Images and Videos. Cogn Comput 13, 698–708 (2021). https://doi.org/10.1007/s12559-021-09851-8

Download citation

Received: 09 June 2020
Accepted: 10 February 2021
Published: 03 March 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s12559-021-09851-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative Dictionary Design for Action Classification in Still Images and Videos

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Image Matching from Handcrafted to Deep Features: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Ethical Approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discriminative Dictionary Design for Action Classification in Still Images and Videos

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Image Matching from Handcrafted to Deep Features: A Survey

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of Interest

Ethical Approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation