Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

Jahlan, Heyam M. Bin; Elrefaei, Lamiaa A.

doi:10.1007/s13369-021-05589-5

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

Research Article-Computer Engineering and Computer Science
Published: 08 April 2021

Volume 46, pages 8549–8563, (2021)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

364 Accesses
4 Citations
Explore all metrics

Abstract

Recently, surveillance cameras are deployed in many public places to monitor human activities. Detecting violence in videos through automatic analysis means significant for law enforcement. But almost many monitoring systems require to manually identify violent scenes in the video which leads to slow response. However, violence detection is a challenging problem because of the broad definition of violence. In this work, we will concern with physical violence that involved two persons or more. This work proposed a novel method to detect violence using automated mobile neural architecture search network and convolution long short-term-memory to extract spatiotemporal features in the video, and then adding two types of pooling layers max and average pooling to capture richer features, standard scaling these features and reducing the dimension using linear discriminative analysis to remove redundant features, and making classifier algorithms working well in low dimension. For classification, we trained and tested various machine learning models which are random forest, support vector machine (SVM), and K-nearest neighbor classifiers. We develop a combined dataset that contains violence and non-violence scenes from public datasets: hockey, movie, and violent flow. The performance of the proposed method is evaluated on a combined dataset in addition to three benchmark datasets, hockey, movie, and violent flow datasets in terms of detection accuracy. The results of our model showed high performance in combined, movie, and violent flow datasets using SVM classifier with accuracies of 97.5%, 100%, and 96%, respectively, whereas in the hockey dataset, we achieve the best result of 99.3% using the random forest classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Detection of violence using mosaicking and DFE- WLSRF: Deep feature extraction with weighted least square with random forest

Article 11 October 2023

A multi-stream CNN for deep violence detection in video sequences using handcrafted features

Article 26 July 2021

Violence Detection in Videos Using Transfer Learning and LSTM

References

Caetano, C.; Bremond, F.; Schwartz, W.: Skeleton image representation for 3D action recognition based on tree structure and reference joints. In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) 2019, pp. 1–8 (2019).
Rahmani, H.; Mian, A.; Shah, M.: Learning a deep model for human action recognition from novel viewpoints. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 667–681 (2018)
Article Google Scholar
Li, X.; Chuah, M.: ReHAR: robust and efficient human activity recognition. In: IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe (2018)
Park, K.J.; Yang, S.: Deep neural networks for activity recognition with multi-sensor data in a smart home. In: 2018 IEEE 4th World Forum on Internet of Things (WF-IoT). Singapore, Singapore (2018)
Ullah, F.; Ullah, A.; Muhammad, K.; Haq, I.; Baik, S.: Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors 19(11), 2472 (2019)
Article Google Scholar
Ramzan, M.; et al.: A review on state-of-the-art violence detection techniques. IEEE Access 7, 107560–107575 (2019)
Article Google Scholar
Yi, Y.; Wang, H.: Motion keypoint trajectory and covariance descriptor for human action recognition. Vis. Comput. 34(3), 391–403 (2017)
Article Google Scholar
Li, X.; Wang, D.; Zhang, Y.: Representation for action recognition using trajectory-based low-level local feature and mid-level motion feature. Appl. Comput. Intell. Soft Comput. 2017, 1–7 (2017)
Google Scholar
Zhou, L.; Nagahashi, H.: Real-time action recognition based on key frame detection. In: Proceedings of the 9th International Conference on Machine Learning and Computing, pp. 272–277. Singapore (2018)
Shao, L.; Zhen, X.; Tao, D.; Li, X.: Spatio-temporal laplacian pyramid coding for action recognition. IEEE Trans. Cybern. 44(6), 817–827 (2014)
Article Google Scholar
Sudhakaran, S.; Lanz, O.: Learning to detect violent videos using convolutional long short-term memory. In: International Conference on Advanced Video and Signal Based Surveillance (2017)
Ten, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, V.: MnasNet: Platform-aware neural architecture search for mobile. In: CVPR (2019)
Naik, A.; Gopalakrishna, M.: Violence detection in surveillance video-a survey. Int. J. Latest Res. Eng. Technol. IJLRET, pp. 11–17 (2016)
Hassner, T.; Itcher, Y.; Kliper-Gross, O.: Violent flows: real-time detection of violent crowd behaviour. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence (2012)
Gao, Y.; Liu, H.; Sun, X.; Wang, C.; Liu, Y.: Violence detection using oriented violent flows. Image Vis. Comput. 48–49, 37–41 (2016)
Article Google Scholar
Zhang, T.; Yang, Z.; Jia, W.; Yang, B.; Yang, J.; He, X.: A new method for violence detection in surveillance scenes. Multimedia Tools Appl. 75(12), 7327–7349 (2015)
Article Google Scholar
Moreira, D.; et al.: Temporal robust features for violence detection. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (2017)
Bermejo Nievas, E.; Deniz Suarez, O.; Bueno García, G.; Sukthankar, R.: Violence detection in video using computer vision techniques. In: Real, P., Diaz-Pernil, D., Molina-Abril, H., Berciano, A., Kropatsch, W. (eds.) Computer Analysis of Images and Patterns (CAIP) (2011)
Deniz, O.; Serrano, I.; Bueno, G.; Kim, T.: Fast violence detection in video. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP) (2014)
Vashistha, P.; Bhatnagar, C.; Khan, M.: An architecture to identify violence in video surveillance system using ViF and LBP. In: 2018 4th International Conference on Recent Advances in Information Technology (RAIT) (2018)
Serrano Gracia, I.; Deniz Suarez, O.; Bueno Garcia, G.; Kim, T.: Fast fight detection. PLoS ONE 10(4), e0120448 (2015)
Article Google Scholar
Xia, Q.; Zhang, P.; Wang, J.; Tian, M.; Fei, C.: Real time violence detection based on deep spatio-temporal features. In: Zhou, J. et al. (eds.) Biometric Recognition (CCBR), pp. 157–165 (2018)
Zhou, P.; Ding, Q.; Luo, H.; Hou, X.: Violent interaction detection in video based on deep learning. J. Phys. Conf. Ser. 844, 012044 (2017)
Article Google Scholar
Ding, C.; Fan, S.; Zhu, M.; Feng, W.; Jia, B.: Violence detection in video by using 3D convolutional neural networks. In: Bebis, G. et al. (eds.) Advances in Visual Computing (ISVC), pp. 551–558 (2014)
Dong, Z.; Qin, J.; Wang, Y.: Multi-stream deep networks for person to person violence detection in videos. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds.) Pattern Recognition (CCPR), pp. 517–531 (2016)
Xingjian, S.; Chen, Z.; Wang, H.; Yeung, D.; Wong, Y.: Convolutional LSTM Network: A machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Hanson, A.; Pnvr, K.; Krishnagopal, S.; Davis, L.: Bidirectional convolutional LSTM for the detection of violence in videos. In: Lecture Notes in Computer Science, pp. 280–295 (2019)
Peixoto, B.; Lavi, B.; Pereira Martin, J.P.; Avila, S. Dias, Z.; Rocha, A.: Toward subjective violence detection in videos. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8276–8280 (2019)
Halder, R.; Chatterjee, R.: CNN-BiLSTM model for violence detection in smart surveillance. In: SN Computer Science (2020) 1:201, pp. 1–9 (2020)
Sharma, M.; Baghel, R.: Video surveillance for violence detection using deep learning. In: Borah, S., Emilia Balas, V., Polkowski, Z. (eds.) Advances in Data Science and Management, pp. 411–420 (2020)
Aktı, S.; Ayse, G.; Ekenel, H.: Vision-based fight detection from surveillance cameras. In: IEEE/EURASIP 9th International Conference on Image Processing Theory, Tools and Applications (2020)
Serrano, I.; Deniz, O.; Espinosa-Aranda, J.; Bueno, G.: Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Trans. Image Process. 27(10), 4787–4797 (2018)
Article MathSciNet Google Scholar
Meng, Z.; Yuan, J.; Li, Z.: Trajectory-pooled deep convolutional networks for violence detection in videos. In: Lecture Notes in Computer Science, pp. 437–447 (2017)
Tan, M.: MnasNet: towards automating the design of mobile machine learning models. Google AI Blog (2019). [Online]. Available: https://ai.googleblog.com/2018/08/mnasnet-towards-automating-design-of.html. Accessed 12 Sept 2019
Ryan, J.; Savakis, A.: Anomaly detection in video using predictive convolutional long short-term memory networks. In: Computer Vision and Pattern Recognition (cs.CV) (2016)
Patraucean, A.H.; Cipolla, R.: Spatio-temporal video autoencoder with differentiable memory. In: ICLR Workshop (2016)
Asaithambi, S.: Why, how and when to scale your features. Medium (2018). [Online]. Available: https://medium.com/greyatom/why-how-and-when-to-scale-your-features-4b30ab09db5e. Accessed 5 Sept 2019.
Maklin, C.: Linear discriminant analysis in python. Medium (2019). [Online]. Available: https://towardsdatascience.com/linear-discriminant-analysis-in-python-76b8b17817c2. Accessed 12 Oct 2019
Ben Fraj, M.: In depth: parameter tuning for random forest. Medium (2019). [Online]. Available: https://medium.com/all-things-ai/in-depth-parameter-tuning-for-random-forest-d67bb7e920d. Accessed 12 Sept 2019
Plapinger, T.: Tuning a random forest classifier. Medium (2019). [Online]. Available: https://medium.com/@taplapinger/tuning-a-random-forest-classifier-1b252d1dde92. Accessed 12 Sept 2019.
Pupale, R.: Support vector machines (SVM)—an overview. Medium (2019). [Online]. Available: https://towardsdatascience.com/https-medium-com-pupalerushikesh-svm-f4b42800e989. Accessed 12 Sept 2019
Ben Fraj, M.: In depth: parameter tuning for KNN. Medium (2019). [Online]. Available: https://medium.com/@mohtedibf/in-depth-parameter-tuning-for-knn-4c0de485baf6. Accessed 12 Sept 2019
Deis, A.: Data augmentation for deep learning. Medium (2019). [Online]. Available: https://towardsdatascience.com/data-augmentation-for-deep-learning-4fe21d1a4eb9. Accessed 7 Sept 2019
torchvision.transforms—PyTorch master documentation. Pytorch.org (2019). [Online]. Available: https://pytorch.org/docs/stable/torchvision/transforms.html. Accessed 7 Sept 2019
Cioloboc, F.: Ideas on how to fine-tune a pre-trained model in PyTorch. Medium (2019). [Online]. Available: https://medium.com/udacity-pytorch-challengers/ideas-on-how-to-fine-tune-a-pre-trained-model-in-pytorch-184c47185a20. Accessed 5 Sept 2019
scikit-learn developers: Linear and quadratic discriminant analysis. scikti-learn 0.23.2 (2020). [online]- Available: https://scikit-learn.org/stable/modules/lda_qda.html. [Accessed: 11- Nov- 2020].
Shung, K.: Accuracy, precision, recall or F1? Medium (2018). [Online]. Available: https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9. Accessed 08 Oct 2019

Download references

Acknowledgements

The authors are grateful to DSR technical and financial support in King Abdulaziz university. This project was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah, under Grant No. DF-421-165-1441.

Author information

Authors and Affiliations

Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
Heyam M. Bin Jahlan & Lamiaa A. Elrefaei
Electrical Engineering Department, Faculty of Engineering at Shoubra, Benha University, Cairo, 11629, Egypt
Lamiaa A. Elrefaei

Authors

Heyam M. Bin Jahlan
View author publications
You can also search for this author in PubMed Google Scholar
Lamiaa A. Elrefaei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heyam M. Bin Jahlan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jahlan, H.M.B., Elrefaei, L.A. Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video. Arab J Sci Eng 46, 8549–8563 (2021). https://doi.org/10.1007/s13369-021-05589-5

Download citation

Received: 14 July 2020
Accepted: 21 March 2021
Published: 08 April 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s13369-021-05589-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

Abstract

Access this article

Similar content being viewed by others

Detection of violence using mosaicking and DFE- WLSRF: Deep feature extraction with weighted least square with random forest

A multi-stream CNN for deep violence detection in video sequences using handcrafted features

Violence Detection in Videos Using Transfer Learning and LSTM

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video

Abstract

Access this article

Similar content being viewed by others

Detection of violence using mosaicking and DFE- WLSRF: Deep feature extraction with weighted least square with random forest

A multi-stream CNN for deep violence detection in video sequences using handcrafted features

Violence Detection in Videos Using Transfer Learning and LSTM

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation