NM-GAN: Noise-modulated generative adversarial network for video anomaly detection

doi:10.1016/j.patcog.2021.107969

Pattern Recognition

Volume 116, August 2021, 107969

https://doi.org/10.1016/j.patcog.2021.107969 Get rights and content

Highlights

•
A more accurate and stable model for video anomaly detection is achieved within a refined end-to-end GAN-like architecture.
•
The reconstruction network has stronger and more controllable generalization ability.
•
The discrimination network uses the reconstruction error map to distinguish anomaly samples.
•
The proposed noise-modulated adversarial learning method enhances the ability of the discriminator to detect anomalies.

Abstract

As an important and challenging task for intelligent video surveillance systems, video anomaly detection is generally referred to as automatic recognition of video frames that contain abnormal targets, behavior or events. Although it has been widely applied in real scenes, anomaly detection remains a challenging task because of the vague definition of anomaly and the lack of the anomaly samples. Inspired by the widespread application of Generative Adversarial Network (GAN), we propose an end-to-end pipeline called NM-GAN which assembles an encode-decoder reconstruction network and a CNN-based discrimination network in a GAN-like architecture. The generalization ability of the reconstruction network is properly modulated via the adversarial learning around reconstruction error maps and noise maps. Meanwhile, the discrimination network is trained to distinguish anomaly samples from normal samples based on the reconstruction error maps. Finally, the output of the discrimination network is transferred to evaluate anomaly score of the input frame. The thorough proof-of-principle experiments and ablation tests on several popular datasets reveal that the proposed model enhance the generalization ability of the reconstruction network and the distinguishability of the discrimination network significantly. The comparison with the state-of-the-art shows that the proposed NM-GAN model outperforms most competing models in precision and stability.

Graphical abstract

Introduction

With the rapid development and wide use of video surveillance systems, the conventional manual analysis for labelling anomalies (such as traffic accident, robbery, violent fight, etc.) in the amount of video data captured from public place monitoring is costly, which has not meet the actual requirements of video surveillance systems. Therefore, an intelligent surveillance system that can recognize and detect anomalies is urgently needed and has been a hotspot of the cross field of computer vision and pattern recognition in recent years [1], [2].

Video anomaly detection is a task of finding and identifying anomalous targets, behaviors, and events in video data. It describes and quantifies all kinds of abnormal events according to a unified criterion rather than using specific semantic labels for fine-grained classification. The most challenging issue in video anomaly detection is that the definition of anomaly is indefinite. In general, “anomaly” refers to the observation results that do occur infrequently and do not conform to expectations, or are significantly different from most objects in a scene, which means anomalies are defined by normal events instead of classifications or details of themselves. During the study, people usually assume that samples that frequently appeared in the dataset are normal, while samples that rarely observed in the dataset are anomalies. The training set of existing anomaly detection dataset contains only normal event samples while the testing set contains normal event samples and abnormal event samples. Therefore, the essence of anomaly detection is novelty detection, which is also called one-class learning. Video anomaly detection can be applied to intelligent surveillance system [3], defect detection [4], medical image processing [5], fault diagnosis [6], and so on.

Although the early anomaly detection methods mostly use the frameworks of manual features matching with domain knowledge, with the rapid development of deep learning, the end-to-end frameworks based on deep neural network, including many different methods like probability estimation [7], one-class learning [8], [9], frame reconstruction and prediction [10], [11], adversarial learning [8], [12] and so on, have gradually become dominant because of their advantages in detection speed, robustness and accuracy. Among all these methods, the frame-reconstruction approach received broad attention. A typical reconstruction-based anomaly detection model generally supposes that the anomaly samples cannot be effectively recovered by the reconstruction network if it is trained only with the normal samples. Therefore, the reconstruction error is used to estimate the anomaly of the new samples in these methods. A satisfactory result is obtained in [13], which uses an Auto-encoder network to reconstruct frame sequence and calculate the mean squared error of reconstruction for anomaly detection. A two-branch model [14] combining reconstruction and prediction by 3D convolution network is proposed to enhance the learning of motion information of samples. Some works introduce adversarial learning into reconstruction methods. In [15] the authors describe a framework of adversarial learning based on an Auto-encoder, which detect anomaly by combining appearance and motion features in a two-stream structure. In contrast, in [11] the authors use the error between the prediction frame, which originates from maximizing the expectation of the next frame by prior knowledge of the past frames, and the ground truth to quantify the anomaly of the video frame. A novel adversarial auto-encoder [16] within an encoder-decoder-encoder pipeline is designed to capture the training data distribution within both image and latent vector space to implement anomaly detection efficiently. ST-CaAE [17] adopts the cascade structure of a spatial-temporal adversarial auto-encoder and a spatial-temporal convolutional auto-encoder to build a two-stream framework to detect anomalies. Though the reconstruction-based approach is criticized for the uncertainty to the unobserved samples, it can provide the multi-scale representation with the higher spatial resolution for the video frames, and the learning of the reconstruction network is generally independent of any prior knowledge and class labels, which makes it more conducive to achieve the real applications. Therefore, the proposed model is carried out in the reconstruction-based approach.

In order to improve the performance of the anomaly detection algorithms further, we propose an end-to-end anomaly detection framework NM-GAN which combines the reconstruction approach [18], [19] and the GAN-based approach [8] together. Firstly, as shown in Fig. 1, all the normal samples with the addictive white noise are pushed into the reconstruction network which are expected to make the corresponding reconstruction error maps follow a pre-defined normal distribution. Conversely, the reconstruction error maps will deviate from the pre-defined normal distribution if the input is anomaly samples. The difference of the reconstruction error maps between the normal samples and the anomaly samples will be discovered by a discrimination network to detect the anomaly frames.

In summary, the main contributions of this paper are as follows: (1) The task of video anomaly detection is realized well by distinguishing the reconstruction error map of the input frame with a CNN-based discrimination network. (2) A new scheme of noise-modulated adversarial learning is proposed to improve the generalization ability of the reconstruction network as well as the distinguishability of the discrimination network. Our model learns in an end-to-end fashion and achieves the state-of-the-art performance in multiple challenging datasets for video anomaly detection.

The rest of the paper is organized as follows. In Section 2, we discuss the related work of reconstruction-based model and adversarial learning model in anomaly detection. We introduce the overall structure of NM-GAN in Section 3. Then, the process of optimizations and verifications for the anomaly detection approach are implemented through a series of experiments in Section 4. We conclude the paper with a summary, limitations, and future study in Section 5.

Section snippets

Reconstruction-based models

The earliest reconstruction model [18] used an encoder-decoder network of image-to-image to reconstruct the input images, which had a little bit of reconstruction error for normal samples and large reconstruction error for abnormal samples. So that the anomaly of input images can be estimated according to the reconstruction error. In [19], the authors combine convolution LSTM and Auto-encoder network to achieve reconstruction and prediction based on video frame sequences, which enhances the

Motivation

The principle of the reconstruction-based approach originates from the hypothesis that the reconstruction network works properly on normal samples while fails to recover anomaly samples. As a consequence, the anomaly samples are supposed to come with larger reconstruction errors, which could be utilized to distinguish anomaly and normal samples. The nature of such a hypothesis is the overfitting of the reconstruction network, which reduces its generalization ability of reconstructing anomaly

Datasets

To validate the proposed anomaly detection model for practical applications of video surveillance, three popular benchmark datasets UCSD, CUHK Avenue, and ShanghaiTe-ch Campus are involved in this section.

UCSD [34] dataset includes two subsets Ped1 and Ped2 that captured respectively from two different outdoor surveillance cameras having 10fps with an image size of 158 $\times$ 234 and 240 $\times$ 360. Ped1 contains 34 training videos and 36 testing videos, including 40 anomalies. Ped2 is composed of 16

Conclusion

In this paper we started by revealing the shortcomings of the traditional approach of reconstruction-based anomaly detection. Then we propose an anomaly detection model referred as NM-GAN which consists of the reconstruction network $R$ and the discrimination network $D$ in the GAN-like architecture. The significant contribution of our work is to modulate the generalization ability of the network $R$ and the distinguish ability of the network $D$ simultaneously by embedding the noise map into the

CRediT authorship contribution statement

Dongyue Chen: Conceptualization, Methodology, Writing - review & editing. Lingyi Yue: Conceptualization, Methodology, Software, Writing - original draft. Xingya Chang: Validation, Investigation, Writing - review & editing. Ming Xu: Formal analysis, Writing - review & editing. Tong Jia: Data curation, Supervision, Investigation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant U1613214 and the National Key Research and Development Program of China under Grant 2018YFB1404101.

Dongyue Chen received a Bachelor of degree in computer science and a Ph.D. degree in Pattern Recognition and Intelligent System from Fudan University, China, in 2002 and 2007, respectively. He is currently a Professor with the College of Information Science and Engineering, Northeastern University, China. His main research fields are computer vision and deep learning, including bionic visual significance computing model, person re-identification, anomaly detection, behavior recognition and

References (40)

Q. Sun et al.
Online growing neural gas for anomaly detection in changing surveillance scenes
Pattern Recognit
(2017)
M. Sabokrou et al.
Deep-anomaly: fully convolutional neural network for fast anomaly detection in crowded scenes
Comput. Vision Image Understanding
(2018)
Y.S. Chong et al.
Abnormal event detection in videos using spatiotemporal autoencoder
International Symposium on Neural Networks
(2017)
R. Ye et al.
Collective representation for abnormal event detection
J Comput Sci Technol
(2017)
M. Shuang et al.
Automatic fabric defect detection with a multi-scale convolutional denoising autoencoder network model
Sensors
(2018)
T. Schlegl et al.
Unsupervised anomaly detection with generative adversarial networks to guide marker discovery
International Conference on Information Processing in Medical Imaging. Springer, Cham
(2017)
D. Li et al.
A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples
Pattern Recognit
(2018)
D. Abati et al.
Latent space autoregression for novelty detection
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2019)
M. Sabokrou et al.
Adversarially learned one-class classifier for novelty detection
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2018)
J. Wang et al.
Gods: generalized one-class discriminative subspaces for anomaly detection
International Conference on Computer Vision
(2019)

M. Xu et al.

An efficient anomaly detection system for crowded scenes using variational autoencoders

Applied Sciences

(2019)

W. Liu et al.

Future frame prediction for anomaly detection a new baseline

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2018)

M. Sabokrou et al.

Deep end-to-end one-class classifier

IEEE Trans Neural Netw Learn Syst

(2020)

M. Hasan et al.

Learning temporal regularity in video sequences

Proceedings of the IEEE conference on computer vision and pattern recognition

(2016)

Y. Zhao et al.

Spatio-temporal autoencoder for video anomaly detection

Proceedings of the 25th ACM international conference on Multimedia

(2017)

M. Ravanbakhsh et al.

Abnormal event detection in videos using generative adversarial nets

2017 IEEE International Conference on Image Processin

(2017)

A. Samet et al.

Ganomaly: semi-supervised anomaly detection via adversarial training

Asian Conference on Computer Vision.Springer, Cham

(2018)

N. Li et al.

Spatial-temporal cascade autoencoder for video anomaly detection in crowded scenes

IEEE Trans Multimedia

(2020)

R. Manassés et al.

A study of deep convolutional auto-encoders for anomaly detection in videos

Pattern Recognit Lett

(2018)

D. Xu et al.

Detecting anomalous events in videos by learning deep representations of appearance and motion

Computer Vision and Image Understandin

(2017)

Cited by (58)

Video anomaly detection based on a multi-layer reconstruction autoencoder with a variance attention strategy
2024, Image and Vision Computing
In this paper, we propose a comprehensive framework for detecting anomalies in videos based on autoencoder (AE). Traditional AE models solely rely on input and final reconstruction, potentially limiting their capacity to fully utilize the intermediate neural network layers. To mitigate this limitation, we introduce a novel approach that concurrently trains the model using corresponding intermediate layers from both the encoder and decoder. This allows the model to capture more intricate features, thus enhancing its anomaly detection capabilities. Furthermore, we introduce a motion loss function that exclusively relies on original video frames rather than optical flow, rendering it more efficient and capable of extracting motion features. Additionally, we have devised a variance attention strategy that is parameter-free and can automatically directs our model's focus towards moving objects, further boosting the performance of our approach. Our experiments on three public datasets demonstrate the effectiveness and efficiency of our method in identifying abnormal events in complex scenarios. The code is publicly available at https://github.com/lsf2008/multRecLossAEPub.
Attention-guided generator with dual discriminator GAN for real-time video anomaly detection
2024, Engineering Applications of Artificial Intelligence
Detecting anomalies in videos presents a significant challenge in the field of video surveillance. The primary goal is identifying and detecting uncommon actions or events within a video sequence. The difficulty arises from the limited availability of video frames depicting anomalies and the ambiguous definition of anomaly. Based on extensive applications of Generative Adversarial Networks (GANs), which consist of a generator and a discriminator network, we propose an Attention-guided Generator with Dual Discriminator GAN (A2D-GAN) for real-time video anomaly detection (VAD). The generator network uses an encoder–decoder architecture with a multi-stage self-attention added to the encoder and multi-stage channel attention added to the decoder. The framework uses adversarial learning from noise and video frame reconstruction to enhance the generalization of the generator network. Also, of the dual discriminator in A2D-GAN, one discriminates between the reconstructed video frame and the real video frame, while the other discriminates between the reconstructed noise and the real noise. Exhaustive experiments and ablation studies on four benchmark video anomaly datasets, namely UCSD Peds, CUHK Avenue, ShanghaiTech, and Subway, demonstrate the effectiveness of the proposed A2D-GAN compared to other state-of-the-art methods. The proposed A2D-GAN model is robust and can detect anomalies in videos in real-time. The source code to replicate the results of the proposed A2D-GAN model is available at https://github.com/Rituraj-ksi/A2D-GAN.
CVAD-GAN: Constrained video anomaly detection via generative adversarial network
2024, Image and Vision Computing
Automatic detection of abnormal behavior in video sequences is a fundamental and challenging problem for intelligent video surveillance systems. However, the existing state-of-the-art Video Anomaly Detection (VAD) methods are computationally expensive and lack the desired robustness in real-world scenarios. The contemporary VAD methods cannot detect the fundamental features absent during training, which usually results in a high false positive rate while testing. To this end, we propose a Constrained Generative Adversarial Network (CVAD-GAN) for real-time VAD. Adding white Gaussian noise to the input video frame with constrained latent space of CVAD-GAN improves its fine-grained features learning from the normal video frames. Also, the dilated convolution layers and skip-connection preserve the information across layers to understand the broader context of complex video scenes in real-time. Our proposed approach achieves a higher Area Under Curve (AUC) score and a lower Equal Error Rate (EER) with enhanced computational efficiency than the existing state-of-the-art VAD methods. CVAD-GAN achieves an AUC and EER score of 98.0% and 6.0% on UCSD Peds1, 97.8% and 7.0% on UCSD Peds2, 94.0% and 8.1% on CUHK Avenue, and 76.2% and 21.7% on ShanghaiTech dataset, respectively. Also, it detects 63 and 19 abnormal events, with false alarms of 3 and 1, respectively, on the Subway-Entry and Subway-Exit datasets. The source code to replicate the results of the proposed CVAD-GAN is available at https://github.com/Rituraj-ksi/CVAD-GAN.
Adversarial and focused training of abnormal videos for weakly-supervised anomaly detection
2024, Pattern Recognition
Due to the sparsity and scarcity of abnormal events, intra-video and inter-video data imbalance problems are fundamental issues for the weakly supervised video anomaly detection (WS-VAD) task. Many previous works have made great progress in the intra-video data imbalance problem while lacking attention to the inter-video case. However, we find that when reducing the number of abnormal videos used for training, the performance of some existing state-of-the-art WS-VAD methods will be decreased. To alleviate this problem, we propose a novel solution by adversarial and focused training (AFT) of abnormal videos. Specifically, our solution consists of two modules. One is a data-based adversarial training (AT) module that performs data augmentation through latent space-based adversarial sample generation of abnormal videos, and the other is a model-based focused training (FT) module that focuses on the cost-sensitive loss of abnormal videos. Once the whole pipeline has been trained, a score-level late fusion strategy is employed to combine the abnormal scores of both adversarial training and focused training modules in the testing phase. The effectiveness of the proposed approach is demonstrated on UCF-Crime, ShanghaiTech, XD-Violence, and UCSD Peds datasets in both the inter-video data imbalanced experimental setting and the original experimental setting. The source code is available at: https://github.com/Destind/AFT_codes.
Generative adversarial networks via a composite annealing of noise and diffusion
2024, Pattern Recognition
Generative adversarial network (GAN) is a framework for generating fake data using a set of real examples. However, GAN is unstable in the training stage. In order to stabilize GANs, the noise injection has been used to enlarge the overlap of the real and fake distributions at the cost of increasing variance. The diffusion process (or data smoothing in its spatial domain) removes fine details in order to capture the structure and important patterns in data but it suppresses the capability of GANs to learn high-frequency information in the training procedure. Based on these observations, we propose a data representation for the GAN training, called noisy scale-space (NSS), that recursively applies the smoothing with a balanced noise to data in order to replace the high-frequency information by random data, leading to a coarse-to-fine training of GANs. We experiment with NSS using DCGAN and StyleGAN2 based on benchmark datasets in which the NSS-based GANs outperforms the state-of-the-arts in most cases.
Hyperspectral anomaly detection based on variational background inference and generative adversarial network
2023, Pattern Recognition
Hyperspectral anomaly detection is aimed at detecting targets with significant spectral differences from their surroundings. Recently, deep generative models have been applied to anomaly detections, while the existing generative adversarial network (GAN)-based methods have difficulty in accurately modeling the background and achieving spectrum reconstruction. In this article, a hyperspectral anomaly detection network based on variational background inference and generative adversarial framework (VBIGAN-AD) is proposed. The proposed VBIGAN model can learn the background distribution characteristics of HSIs and enhance the detection performance by the use of reconstruction errors. Specifically, the VBIGAN framework consists of sample and latent GANs, which establishes the relationship between data samples and latent samples through two sub-networks to capture the data distribution. Furthermore, the variational inference method is introduced and the hyperspectral background distribution can be converged to a multivariate normal distribution. To accurately learn the background distribution characteristics and reconstruct the background spectra, the coupling loss is conducted by enforcing feature match in the two discriminators on the basis of composite loss, and the results show that the additional loss can promote the detection performance. As a result, the reconstruction errors generated by the VBIGAN-AD method is utilized to detect abnormal targets. The experiments conducted on five datasets proved the robustness and applicability of the proposed VBIGAN-AD method.

View all citing articles on Scopus

Lingyi Yue received a Bachelor of degree majored in Automation from the Xiangtan University, China. She is currently a postgraduate at the College of Information Science and Engineering of Northeastern University, Shenyang, China since 2018. Her research interests include computer vision, anomaly detection, and pattern recognition.

Xingya Chang is currently pursuing a Ph.D. at the College of Information Science and Engineering of Northeastern University, Shenyang, China. His area of expertise revolves around anomaly detection and computer vision.

Ming Xu received a Bachelor of degree majored in Automation and a Master degree majored in Pattern Recognition and Intelligent System from the Northeastern University (China). He is currently a Ph.D. student. His research field covers computer vision, image processing and machine learning. And the major interests focused on the object detection and recognition in the visible light images, infrared thermal image and the medical images.

Tong Jia was born in Shenyang, China, in 1975.He received the bachelors degree in computer science and the Ph.D. degree in pattern identification and intelligent system from Northeastern University, China, in 1998 and 2008, respectively. He is currently a Professor with the College of Information Science and Engineering, Northeastern University. His research interests include computer/machine vision, image processing, and pattern identification.

View full text

NM-GAN: Noise-modulated generative adversarial network for video anomaly detection

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Reconstruction-based models

Motivation

Datasets

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Pattern Recognit

Comput. Vision Image Understanding

International Symposium on Neural Networks

Collective representation for abnormal event detection

J Comput Sci Technol

Automatic fabric defect detection with a multi-scale convolutional denoising autoencoder network model

Sensors

Unsupervised anomaly detection with generative adversarial networks to guide marker discovery

International Conference on Information Processing in Medical Imaging. Springer, Cham

A method of anomaly detection and fault diagnosis with online adaptive learning under small training samples

Pattern Recognit

Latent space autoregression for novelty detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Adversarially learned one-class classifier for novelty detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Gods: generalized one-class discriminative subspaces for anomaly detection

International Conference on Computer Vision

An efficient anomaly detection system for crowded scenes using variational autoencoders

Applied Sciences

Future frame prediction for anomaly detection a new baseline

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Deep end-to-end one-class classifier

IEEE Trans Neural Netw Learn Syst

Learning temporal regularity in video sequences

Proceedings of the IEEE conference on computer vision and pattern recognition

Spatio-temporal autoencoder for video anomaly detection

Proceedings of the 25th ACM international conference on Multimedia

Abnormal event detection in videos using generative adversarial nets

2017 IEEE International Conference on Image Processin

Ganomaly: semi-supervised anomaly detection via adversarial training

Asian Conference on Computer Vision.Springer, Cham

Spatial-temporal cascade autoencoder for video anomaly detection in crowded scenes

IEEE Trans Multimedia

A study of deep convolutional auto-encoders for anomaly detection in videos

Pattern Recognit Lett

Detecting anomalous events in videos by learning deep representations of appearance and motion

Computer Vision and Image Understandin