Elsevier

Pattern Recognition

Volume 116, August 2021, 107969
Pattern Recognition

NM-GAN: Noise-modulated generative adversarial network for video anomaly detection

https://doi.org/10.1016/j.patcog.2021.107969Get rights and content

Highlights

  • A more accurate and stable model for video anomaly detection is achieved within a refined end-to-end GAN-like architecture.

  • The reconstruction network has stronger and more controllable generalization ability.

  • The discrimination network uses the reconstruction error map to distinguish anomaly samples.

  • The proposed noise-modulated adversarial learning method enhances the ability of the discriminator to detect anomalies.

Abstract

As an important and challenging task for intelligent video surveillance systems, video anomaly detection is generally referred to as automatic recognition of video frames that contain abnormal targets, behavior or events. Although it has been widely applied in real scenes, anomaly detection remains a challenging task because of the vague definition of anomaly and the lack of the anomaly samples. Inspired by the widespread application of Generative Adversarial Network (GAN), we propose an end-to-end pipeline called NM-GAN which assembles an encode-decoder reconstruction network and a CNN-based discrimination network in a GAN-like architecture. The generalization ability of the reconstruction network is properly modulated via the adversarial learning around reconstruction error maps and noise maps. Meanwhile, the discrimination network is trained to distinguish anomaly samples from normal samples based on the reconstruction error maps. Finally, the output of the discrimination network is transferred to evaluate anomaly score of the input frame. The thorough proof-of-principle experiments and ablation tests on several popular datasets reveal that the proposed model enhance the generalization ability of the reconstruction network and the distinguishability of the discrimination network significantly. The comparison with the state-of-the-art shows that the proposed NM-GAN model outperforms most competing models in precision and stability.

Introduction

With the rapid development and wide use of video surveillance systems, the conventional manual analysis for labelling anomalies (such as traffic accident, robbery, violent fight, etc.) in the amount of video data captured from public place monitoring is costly, which has not meet the actual requirements of video surveillance systems. Therefore, an intelligent surveillance system that can recognize and detect anomalies is urgently needed and has been a hotspot of the cross field of computer vision and pattern recognition in recent years [1], [2].

Video anomaly detection is a task of finding and identifying anomalous targets, behaviors, and events in video data. It describes and quantifies all kinds of abnormal events according to a unified criterion rather than using specific semantic labels for fine-grained classification. The most challenging issue in video anomaly detection is that the definition of anomaly is indefinite. In general, “anomaly” refers to the observation results that do occur infrequently and do not conform to expectations, or are significantly different from most objects in a scene, which means anomalies are defined by normal events instead of classifications or details of themselves. During the study, people usually assume that samples that frequently appeared in the dataset are normal, while samples that rarely observed in the dataset are anomalies. The training set of existing anomaly detection dataset contains only normal event samples while the testing set contains normal event samples and abnormal event samples. Therefore, the essence of anomaly detection is novelty detection, which is also called one-class learning. Video anomaly detection can be applied to intelligent surveillance system [3], defect detection [4], medical image processing [5], fault diagnosis [6], and so on.

Although the early anomaly detection methods mostly use the frameworks of manual features matching with domain knowledge, with the rapid development of deep learning, the end-to-end frameworks based on deep neural network, including many different methods like probability estimation [7], one-class learning [8], [9], frame reconstruction and prediction [10], [11], adversarial learning [8], [12] and so on, have gradually become dominant because of their advantages in detection speed, robustness and accuracy. Among all these methods, the frame-reconstruction approach received broad attention. A typical reconstruction-based anomaly detection model generally supposes that the anomaly samples cannot be effectively recovered by the reconstruction network if it is trained only with the normal samples. Therefore, the reconstruction error is used to estimate the anomaly of the new samples in these methods. A satisfactory result is obtained in [13], which uses an Auto-encoder network to reconstruct frame sequence and calculate the mean squared error of reconstruction for anomaly detection. A two-branch model [14] combining reconstruction and prediction by 3D convolution network is proposed to enhance the learning of motion information of samples. Some works introduce adversarial learning into reconstruction methods. In [15] the authors describe a framework of adversarial learning based on an Auto-encoder, which detect anomaly by combining appearance and motion features in a two-stream structure. In contrast, in [11] the authors use the error between the prediction frame, which originates from maximizing the expectation of the next frame by prior knowledge of the past frames, and the ground truth to quantify the anomaly of the video frame. A novel adversarial auto-encoder [16] within an encoder-decoder-encoder pipeline is designed to capture the training data distribution within both image and latent vector space to implement anomaly detection efficiently. ST-CaAE [17] adopts the cascade structure of a spatial-temporal adversarial auto-encoder and a spatial-temporal convolutional auto-encoder to build a two-stream framework to detect anomalies. Though the reconstruction-based approach is criticized for the uncertainty to the unobserved samples, it can provide the multi-scale representation with the higher spatial resolution for the video frames, and the learning of the reconstruction network is generally independent of any prior knowledge and class labels, which makes it more conducive to achieve the real applications. Therefore, the proposed model is carried out in the reconstruction-based approach.

In order to improve the performance of the anomaly detection algorithms further, we propose an end-to-end anomaly detection framework NM-GAN which combines the reconstruction approach [18], [19] and the GAN-based approach [8] together. Firstly, as shown in Fig. 1, all the normal samples with the addictive white noise are pushed into the reconstruction network which are expected to make the corresponding reconstruction error maps follow a pre-defined normal distribution. Conversely, the reconstruction error maps will deviate from the pre-defined normal distribution if the input is anomaly samples. The difference of the reconstruction error maps between the normal samples and the anomaly samples will be discovered by a discrimination network to detect the anomaly frames.

In summary, the main contributions of this paper are as follows: (1) The task of video anomaly detection is realized well by distinguishing the reconstruction error map of the input frame with a CNN-based discrimination network. (2) A new scheme of noise-modulated adversarial learning is proposed to improve the generalization ability of the reconstruction network as well as the distinguishability of the discrimination network. Our model learns in an end-to-end fashion and achieves the state-of-the-art performance in multiple challenging datasets for video anomaly detection.

The rest of the paper is organized as follows. In Section 2, we discuss the related work of reconstruction-based model and adversarial learning model in anomaly detection. We introduce the overall structure of NM-GAN in Section 3. Then, the process of optimizations and verifications for the anomaly detection approach are implemented through a series of experiments in Section 4. We conclude the paper with a summary, limitations, and future study in Section 5.

Section snippets

Reconstruction-based models

The earliest reconstruction model [18] used an encoder-decoder network of image-to-image to reconstruct the input images, which had a little bit of reconstruction error for normal samples and large reconstruction error for abnormal samples. So that the anomaly of input images can be estimated according to the reconstruction error. In [19], the authors combine convolution LSTM and Auto-encoder network to achieve reconstruction and prediction based on video frame sequences, which enhances the

Motivation

The principle of the reconstruction-based approach originates from the hypothesis that the reconstruction network works properly on normal samples while fails to recover anomaly samples. As a consequence, the anomaly samples are supposed to come with larger reconstruction errors, which could be utilized to distinguish anomaly and normal samples. The nature of such a hypothesis is the overfitting of the reconstruction network, which reduces its generalization ability of reconstructing anomaly

Datasets

To validate the proposed anomaly detection model for practical applications of video surveillance, three popular benchmark datasets UCSD, CUHK Avenue, and ShanghaiTe-ch Campus are involved in this section.

UCSD [34] dataset includes two subsets Ped1 and Ped2 that captured respectively from two different outdoor surveillance cameras having 10fps with an image size of 158 × 234 and 240 × 360. Ped1 contains 34 training videos and 36 testing videos, including 40 anomalies. Ped2 is composed of 16

Conclusion

In this paper we started by revealing the shortcomings of the traditional approach of reconstruction-based anomaly detection. Then we propose an anomaly detection model referred as NM-GAN which consists of the reconstruction network R and the discrimination network D in the GAN-like architecture. The significant contribution of our work is to modulate the generalization ability of the network R and the distinguish ability of the network D simultaneously by embedding the noise map into the

CRediT authorship contribution statement

Dongyue Chen: Conceptualization, Methodology, Writing - review & editing. Lingyi Yue: Conceptualization, Methodology, Software, Writing - original draft. Xingya Chang: Validation, Investigation, Writing - review & editing. Ming Xu: Formal analysis, Writing - review & editing. Tong Jia: Data curation, Supervision, Investigation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant U1613214 and the National Key Research and Development Program of China under Grant 2018YFB1404101.

Dongyue Chen received a Bachelor of degree in computer science and a Ph.D. degree in Pattern Recognition and Intelligent System from Fudan University, China, in 2002 and 2007, respectively. He is currently a Professor with the College of Information Science and Engineering, Northeastern University, China. His main research fields are computer vision and deep learning, including bionic visual significance computing model, person re-identification, anomaly detection, behavior recognition and

References (40)

  • Q. Sun et al.

    Online growing neural gas for anomaly detection in changing surveillance scenes

    Pattern Recognit

    (2017)
  • M. Sabokrou et al.

    Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes

    Comput. Vision Image Understanding

    (2018)
  • Y.S. Chong et al.

    Abnormal event detection in videos using spatiotemporal autoencoder

    International Symposium on Neural Networks

    (2017)
  • R. Ye et al.

    Collective representation for abnormal event detection

    J Comput Sci Technol

    (2017)
  • M. Shuang et al.

    Automatic fabric defect detection with a multi-scale convolutional denoising autoencoder network model

    Sensors

    (2018)
  • T. Schlegl et al.

    Unsupervised anomaly detection with generative adversarial networks to guide marker discovery

    International Conference on Information Processing in Medical Imaging. Springer, Cham

    (2017)
  • D. Li et al.

    A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples

    Pattern Recognit

    (2018)
  • D. Abati et al.

    Latent space autoregression for novelty detection

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2019)
  • M. Sabokrou et al.

    Adversarially learned one-class classifier for novelty detection

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • J. Wang et al.

    Gods: generalized one-class discriminative subspaces for anomaly detection

    International Conference on Computer Vision

    (2019)
  • M. Xu et al.

    An efficient anomaly detection system for crowded scenes using variational autoencoders

    Applied Sciences

    (2019)
  • W. Liu et al.

    Future frame prediction for anomaly detection a new baseline

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • M. Sabokrou et al.

    Deep end-to-end one-class classifier

    IEEE Trans Neural Netw Learn Syst

    (2020)
  • M. Hasan et al.

    Learning temporal regularity in video sequences

    Proceedings of the IEEE conference on computer vision and pattern recognition

    (2016)
  • Y. Zhao et al.

    Spatio-temporal autoencoder for video anomaly detection

    Proceedings of the 25th ACM international conference on Multimedia

    (2017)
  • M. Ravanbakhsh et al.

    Abnormal event detection in videos using generative adversarial nets

    2017 IEEE International Conference on Image Processin

    (2017)
  • A. Samet et al.

    Ganomaly: semi-supervised anomaly detection via adversarial training

    Asian Conference on Computer Vision.Springer, Cham

    (2018)
  • N. Li et al.

    Spatial-temporal cascade autoencoder for video anomaly detection in crowded scenes

    IEEE Trans Multimedia

    (2020)
  • R. Manassés et al.

    A study of deep convolutional auto-encoders for anomaly detection in videos

    Pattern Recognit Lett

    (2018)
  • D. Xu et al.

    Detecting anomalous events in videos by learning deep representations of appearance and motion

    Computer Vision and Image Understandin

    (2017)
  • Cited by (58)

    View all citing articles on Scopus

    Dongyue Chen received a Bachelor of degree in computer science and a Ph.D. degree in Pattern Recognition and Intelligent System from Fudan University, China, in 2002 and 2007, respectively. He is currently a Professor with the College of Information Science and Engineering, Northeastern University, China. His main research fields are computer vision and deep learning, including bionic visual significance computing model, person re-identification, anomaly detection, behavior recognition and scene understanding, and medical image processing.

    Lingyi Yue received a Bachelor of degree majored in Automation from the Xiangtan University, China. She is currently a postgraduate at the College of Information Science and Engineering of Northeastern University, Shenyang, China since 2018. Her research interests include computer vision, anomaly detection, and pattern recognition.

    Xingya Chang is currently pursuing a Ph.D. at the College of Information Science and Engineering of Northeastern University, Shenyang, China. His area of expertise revolves around anomaly detection and computer vision.

    Ming Xu received a Bachelor of degree majored in Automation and a Master degree majored in Pattern Recognition and Intelligent System from the Northeastern University (China). He is currently a Ph.D. student. His research field covers computer vision, image processing and machine learning. And the major interests focused on the object detection and recognition in the visible light images, infrared thermal image and the medical images.

    Tong Jia was born in Shenyang, China, in 1975.He received the bachelors degree in computer science and the Ph.D. degree in pattern identification and intelligent system from Northeastern University, China, in 1998 and 2008, respectively. He is currently a Professor with the College of Information Science and Engineering, Northeastern University. His research interests include computer/machine vision, image processing, and pattern identification.

    View full text