Skip to main content
Log in

Multi-Stream 3D latent feature clustering for abnormality detection in videos

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Detection of abnormal behavior in surveillance videos is essential for public safety and monitoring. However, it needs constant human focus and attention for human-based surveillance systems, which is a challenging process. Therefore, automatic detection of such events is of great significance. Abnormal event detection is a challenging problem due to the scarceness of labelled data and the low probability of occurrence of such events. In this paper, we propose a novel multi-stream two-stage architecture to detect abnormal behavior in videos. Our contributions are three-fold: 1) In the first stage, we propose a 3D Convolutional Autoencoder (3DCAE) architecture for appearance and motion feature extraction from both video frame input and dynamic flow input streams of normal event training videos in an unsupervised manner. 2) We have used a multi-objective loss function for 3DCAE reconstruction which can focus more on foreground moving objects rather that the stationary background information. 3) In the second stage, the fused latent features from both video frames and dynamic flow inputs are grouped together into different clusters of normality. Then we eliminate the smaller or sparse clusters, which are supposed to contain noisy patterns in the training data, to represent stronger normality patterns. A Deep one-class Support Vector Data Description (SVDD) classifier is then trained on these 3D normality clusters to generate anomaly scores for each sample in 3D clusters to differentiate between normal and abnormal occurrences. Experimental results on three benchmarking datasets: UCSD Pedestrian, Shanghai Tech, and Avenue, show significant improvement in the performance compared to the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Ruff L, Vandermeulen R, Goernitz N, Deecke L, Siddiqui S A, Binder A, Müller E, Kloft M (2018) Deep one-class classification. In: International conference on machine learning, pp 4393– 4402

  2. Ravanbakhsh M, Nabi M, Mousavi H, Sangineto E, Sebe N (2018) Plug-and-play cnn for crowd motion analysis: an application in abnormal event detection. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1689–1698

  3. Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the ieee international conference on computer vision, pp 341–349

  4. Del Giorno A, Bagnell J A, Hebert M (2016) A discriminative framework for anomaly detection in large videos. In: European conference on computer vision. Springer, pp 334–349

  5. Hinami R, Mei T, Satoh S (2017) Joint detection and recounting of abnormal events by learning deep generic knowledge. In: Proceedings of the IEEE international conference on computer vision, pp 3619–3627

  6. Tudor Ionescu R, Smeureanu S, Alexe B, Popescu M (2017) Unmasking the abnormal events in video. In: Proceedings of the IEEE international conference on computer vision, pp 2895– 2903

  7. Ionescu R T, Smeureanu S, Popescu M, Alexe B (2019) Detecting abnormal events in video using narrowed normality clusters. In: 2019 IEEE winter conference on applications of computer vision (WACV) IEEE, pp 1951–1960

  8. Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545

  9. Hasan M, Choi J, Neumann J, Roy-Chowdhury A K, Davis L S (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 733–742

  10. Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127

    Article  Google Scholar 

  11. Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488

  12. Fu Z, Hu W, Tan T (2005) Similarity based vehicle trajectory clustering and anomaly detection. In: IEEE international conference on image processing 2005, vol 20 IEEE, pp II–602

  13. Wang X, Tieu K, Grimson E (2006) Learning semantic scene models by trajectory analysis. In: European conference on computer vision. Springer, pp 110–123

  14. Zhao B, Fei-Fei L, Xing E P (2011) Online detection of unusual events in videos via dynamic sparse coding. In: CVPR 2011. IEEE, pp 3313–3320

  15. Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: CVPR 2011. IEEE, pp 3449–3456

  16. Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE international conference on computer vision, pp 2720–2727

  17. Zhu X, Li X, Zhang S, Ju C, Wu X (2016) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 28(6):1263–1275

    Article  MathSciNet  Google Scholar 

  18. Zhu X, Zhang L, Huang Z (2014) A sparse embedding and least variance encoding approach to hashing. IEEE Trans Image Process 23(9):3737–3750

    Article  MathSciNet  Google Scholar 

  19. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  20. Asad M, Yang J, He J, Shamsolmoali P, He X (2020) Multi-frame feature-fusion-based model for violence detection. Vis Comput:1–17. https://doi.org/10.1007/s00371-020-01878-6

  21. Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  22. Pérez-Hernández F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: application in video surveillance. Knowl-Based Syst 194:105590

    Article  Google Scholar 

  23. Mei S, Ji J, Geng Y, Zhang Z, Li X, Du Q (2019) Unsupervised spatial-spectral feature learning by 3d convolutional autoencoder for hyperspectral classification. IEEE Trans Geosci Remote Sens

  24. Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 1975–1981

  25. Antonakaki P, Kosmopoulos D, Perantonis S J (2009) Detecting abnormal human behaviour using multiple cameras. Signal Process 89(9):1723–1738

    Article  Google Scholar 

  26. Jiang F, Yuan J, Tsaftaris S A, Katsaggelos A K (2011) Anomalous video event detection using spatiotemporal context. Comput Vis Image Underst 115(3):323–333

    Article  Google Scholar 

  27. Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560

    Article  Google Scholar 

  28. Saligrama V, Chen Z (2012) Video anomaly detection based on local statistical aggregates. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2112–2119

  29. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941

  30. Zhong J-X, Li N, Kong W, Liu S, Li T H, Li G (2019) Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1237–1246

  31. Jiang F, Wu Y, Katsaggelos A K (2007) Abnormal event detection from surveillance video by dynamic hierarchical clustering. In: 2007 IEEE international conference on image processing, vol 5. IEEE, pp V–145

  32. Bera A, Kim S, Manocha D (2016) Realtime anomaly detection using trajectory-level crowd behavior learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 50–57

  33. Athanesious J J, Chakkaravarthy S S, Vasuhi S, Vaidehi V (2019) Trajectory based abnormal event detection in video traffic surveillance using general potential data field with spectral clustering. Multimed Tools Appl 78(14):19877–19903

    Article  Google Scholar 

  34. Tokmakov P, Hebert M, Schmid C (2020) Unsupervised learning of video representations via dense trajectory clustering. arXiv:2006.15731

  35. Izakian H, Pedrycz W, Jamal I (2013) Clustering spatiotemporal data: an augmented fuzzy c-means. IEEE Trans Fuzzy Syst 21(5):855–868

    Article  Google Scholar 

  36. Mashtalir SV, Stolbovyi MI, Yakovlev SV (2019) Clustering video sequences by the method of harmonic k-means. Cybern Syst Anal 55(2):200–206

    Article  Google Scholar 

  37. Wang J, Cherian A, Porikli F (2017) Ordered pooling of optical flow sequences for action recognition. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 168–176

  38. Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S (2016) Dynamic image networks for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3034–3042

  39. Zhou J T, Zhang L, Fang Z, Du J, Peng X, Yang X (2019) Attention-driven loss for anomaly detection in video surveillance. IEEE transactions on circuits and systems for video technology

  40. Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  Google Scholar 

  41. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15

    Article  Google Scholar 

  42. Abedalla L, Badarna M, Khalifa W, Yousef M (2019) K–means based one-class svm classifier. In: International conference on database and expert systems applications. Springer, pp 45–53

  43. Wang D, Tan X (2016) Unsupervised feature learning with c-svddnet. Pattern Recogn 60:473–485

    Article  Google Scholar 

  44. Gu Q, Han J (2013) Clustered support vector machines. In: Artificial intelligence and statistics, pp 307–315

  45. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  46. Vedaldi A, Fulkerson B (2008) VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/

  47. Chong Y S, Tay Y H (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: International symposium on neural networks. Springer, pp 189–196

  48. Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International conference on multimedia and expo (ICME). IEEE, pp 439–444

  49. Zhou J T, Du J, Zhu H, Peng X, Liu Y, Goh R S M (2019) Anomalynet: an anomaly detection network for video surveillance. IEEE Trans Inf Forensic Secur 14(10):2537–2550

    Article  Google Scholar 

  50. Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 2921–2928

  51. Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 935–942

Download references

Acknowledgments

This research is partly supported by NSFC, China (No: 61876107,U1803261) and Shanghai Natural Science Foundation (19ZR1476300).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jie Yang or Enmei Tu.

Ethics declarations

Conflict of Interest

We have no conflict of interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Asad, M., Jiang, H., Yang, J. et al. Multi-Stream 3D latent feature clustering for abnormality detection in videos. Appl Intell 52, 1126–1143 (2022). https://doi.org/10.1007/s10489-021-02356-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02356-9

Keywords

Navigation