Abstract
Detection of abnormal behavior in surveillance videos is essential for public safety and monitoring. However, it needs constant human focus and attention for human-based surveillance systems, which is a challenging process. Therefore, automatic detection of such events is of great significance. Abnormal event detection is a challenging problem due to the scarceness of labelled data and the low probability of occurrence of such events. In this paper, we propose a novel multi-stream two-stage architecture to detect abnormal behavior in videos. Our contributions are three-fold: 1) In the first stage, we propose a 3D Convolutional Autoencoder (3DCAE) architecture for appearance and motion feature extraction from both video frame input and dynamic flow input streams of normal event training videos in an unsupervised manner. 2) We have used a multi-objective loss function for 3DCAE reconstruction which can focus more on foreground moving objects rather that the stationary background information. 3) In the second stage, the fused latent features from both video frames and dynamic flow inputs are grouped together into different clusters of normality. Then we eliminate the smaller or sparse clusters, which are supposed to contain noisy patterns in the training data, to represent stronger normality patterns. A Deep one-class Support Vector Data Description (SVDD) classifier is then trained on these 3D normality clusters to generate anomaly scores for each sample in 3D clusters to differentiate between normal and abnormal occurrences. Experimental results on three benchmarking datasets: UCSD Pedestrian, Shanghai Tech, and Avenue, show significant improvement in the performance compared to the state-of-the-art approaches.
Similar content being viewed by others
References
Ruff L, Vandermeulen R, Goernitz N, Deecke L, Siddiqui S A, Binder A, Müller E, Kloft M (2018) Deep one-class classification. In: International conference on machine learning, pp 4393– 4402
Ravanbakhsh M, Nabi M, Mousavi H, Sangineto E, Sebe N (2018) Plug-and-play cnn for crowd motion analysis: an application in abnormal event detection. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1689–1698
Luo W, Liu W, Gao S (2017) A revisit of sparse coding based anomaly detection in stacked rnn framework. In: Proceedings of the ieee international conference on computer vision, pp 341–349
Del Giorno A, Bagnell J A, Hebert M (2016) A discriminative framework for anomaly detection in large videos. In: European conference on computer vision. Springer, pp 334–349
Hinami R, Mei T, Satoh S (2017) Joint detection and recounting of abnormal events by learning deep generic knowledge. In: Proceedings of the IEEE international conference on computer vision, pp 3619–3627
Tudor Ionescu R, Smeureanu S, Alexe B, Popescu M (2017) Unmasking the abnormal events in video. In: Proceedings of the IEEE international conference on computer vision, pp 2895– 2903
Ionescu R T, Smeureanu S, Popescu M, Alexe B (2019) Detecting abnormal events in video using narrowed normality clusters. In: 2019 IEEE winter conference on applications of computer vision (WACV) IEEE, pp 1951–1960
Liu W, Luo W, Lian D, Gao S (2018) Future frame prediction for anomaly detection–a new baseline. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6536–6545
Hasan M, Choi J, Neumann J, Roy-Chowdhury A K, Davis L S (2016) Learning temporal regularity in video sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 733–742
Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Underst 156:117–127
Sultani W, Chen C, Shah M (2018) Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6479–6488
Fu Z, Hu W, Tan T (2005) Similarity based vehicle trajectory clustering and anomaly detection. In: IEEE international conference on image processing 2005, vol 20 IEEE, pp II–602
Wang X, Tieu K, Grimson E (2006) Learning semantic scene models by trajectory analysis. In: European conference on computer vision. Springer, pp 110–123
Zhao B, Fei-Fei L, Xing E P (2011) Online detection of unusual events in videos via dynamic sparse coding. In: CVPR 2011. IEEE, pp 3313–3320
Cong Y, Yuan J, Liu J (2011) Sparse reconstruction cost for abnormal event detection. In: CVPR 2011. IEEE, pp 3449–3456
Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE international conference on computer vision, pp 2720–2727
Zhu X, Li X, Zhang S, Ju C, Wu X (2016) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 28(6):1263–1275
Zhu X, Zhang L, Huang Z (2014) A sparse embedding and least variance encoding approach to hashing. IEEE Trans Image Process 23(9):3737–3750
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Asad M, Yang J, He J, Shamsolmoali P, He X (2020) Multi-frame feature-fusion-based model for violence detection. Vis Comput:1–17. https://doi.org/10.1007/s00371-020-01878-6
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Pérez-Hernández F, Tabik S, Lamas A, Olmos R, Fujita H, Herrera F (2020) Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: application in video surveillance. Knowl-Based Syst 194:105590
Mei S, Ji J, Geng Y, Zhang Z, Li X, Du Q (2019) Unsupervised spatial-spectral feature learning by 3d convolutional autoencoder for hyperspectral classification. IEEE Trans Geosci Remote Sens
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 1975–1981
Antonakaki P, Kosmopoulos D, Perantonis S J (2009) Detecting abnormal human behaviour using multiple cameras. Signal Process 89(9):1723–1738
Jiang F, Yuan J, Tsaftaris S A, Katsaggelos A K (2011) Anomalous video event detection using spatiotemporal context. Comput Vis Image Underst 115(3):323–333
Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30(3):555–560
Saligrama V, Chen Z (2012) Video anomaly detection based on local statistical aggregates. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2112–2119
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933–1941
Zhong J-X, Li N, Kong W, Liu S, Li T H, Li G (2019) Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1237–1246
Jiang F, Wu Y, Katsaggelos A K (2007) Abnormal event detection from surveillance video by dynamic hierarchical clustering. In: 2007 IEEE international conference on image processing, vol 5. IEEE, pp V–145
Bera A, Kim S, Manocha D (2016) Realtime anomaly detection using trajectory-level crowd behavior learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 50–57
Athanesious J J, Chakkaravarthy S S, Vasuhi S, Vaidehi V (2019) Trajectory based abnormal event detection in video traffic surveillance using general potential data field with spectral clustering. Multimed Tools Appl 78(14):19877–19903
Tokmakov P, Hebert M, Schmid C (2020) Unsupervised learning of video representations via dense trajectory clustering. arXiv:2006.15731
Izakian H, Pedrycz W, Jamal I (2013) Clustering spatiotemporal data: an augmented fuzzy c-means. IEEE Trans Fuzzy Syst 21(5):855–868
Mashtalir SV, Stolbovyi MI, Yakovlev SV (2019) Clustering video sequences by the method of harmonic k-means. Cybern Syst Anal 55(2):200–206
Wang J, Cherian A, Porikli F (2017) Ordered pooling of optical flow sequences for action recognition. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 168–176
Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S (2016) Dynamic image networks for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3034–3042
Zhou J T, Zhang L, Fang Z, Du J, Peng X, Yang X (2019) Attention-driven loss for anomaly detection in video surveillance. IEEE transactions on circuits and systems for video technology
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Abedalla L, Badarna M, Khalifa W, Yousef M (2019) K–means based one-class svm classifier. In: International conference on database and expert systems applications. Springer, pp 45–53
Wang D, Tan X (2016) Unsupervised feature learning with c-svddnet. Pattern Recogn 60:473–485
Gu Q, Han J (2013) Clustered support vector machines. In: Artificial intelligence and statistics, pp 307–315
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Vedaldi A, Fulkerson B (2008) VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/
Chong Y S, Tay Y H (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: International symposium on neural networks. Springer, pp 189–196
Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017 IEEE International conference on multimedia and expo (ICME). IEEE, pp 439–444
Zhou J T, Du J, Zhu H, Peng X, Liu Y, Goh R S M (2019) Anomalynet: an anomaly detection network for video surveillance. IEEE Trans Inf Forensic Secur 14(10):2537–2550
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incremental updates. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 2921–2928
Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 935–942
Acknowledgments
This research is partly supported by NSFC, China (No: 61876107,U1803261) and Shanghai Natural Science Foundation (19ZR1476300).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interest
We have no conflict of interest to declare.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Asad, M., Jiang, H., Yang, J. et al. Multi-Stream 3D latent feature clustering for abnormality detection in videos. Appl Intell 52, 1126–1143 (2022). https://doi.org/10.1007/s10489-021-02356-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02356-9