Video anomaly detection and localization via Gaussian Mixture Fully Convolutional Variational Autoencoder,Computer Vision and Image Understanding

当前位置： X-MOL 学术 › Comput. Vis. Image Underst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Video anomaly detection and localization via Gaussian Mixture Fully Convolutional Variational Autoencoder
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2020-04-10 , DOI: 10.1016/j.cviu.2020.102920
Yaxiang Fan , Gongjian Wen , Deren Li , Shaohua Qiu , Martin D. Levine , Fei Xiao

We present a novel end-to-end partially supervised deep learning approach for video anomaly detection and localization using only normal samples. The insight that motivates this study is that the normal samples can be associated with at least one Gaussian component of a Gaussian Mixture Model (GMM), while anomalies either do not belong to any Gaussian component. The method is based on Gaussian Mixture Variational Autoencoder, which can learn feature representations of the normal samples as a Gaussian Mixture Model trained using deep learning. A Fully Convolutional Network (FCN) that does not contain a fully-connected layer is employed for the encoder–decoder structure to preserve relative spatial coordinates between the input image and the output feature map. Based on the joint probabilities of each of the Gaussian mixture components, we introduce a sample energy based method to score the anomaly of image test patches. A two-stream network framework is employed to combine the appearance and motion anomalies, using RGB frames for the former and dynamic flow images, for the latter. We test our approach on two popular benchmarks (UCSD Dataset and Avenue Dataset). The experimental results verify the superiority of our method compared to the state of the art.

中文翻译：

通过高斯混合全卷积变分自编码器进行视频异常检测和定位

我们提出了一种仅使用正常样本进行视频异常检测和定位的端到端部分受监督的新型深度学习方法。激发这项研究的见解是，正常样本可以与高斯混合模型（GMM）的至少一个高斯分量相关联，而异常要么不属于任何高斯分量。该方法基于高斯混合变分自动编码器，它可以作为使用深度学习训练的高斯混合模型来学习正常样本的特征表示。不包含完全连接层的完全卷积网络（FCN）用于编码器-解码器结构，以保留输入图像和输出特征图之间的相对空间坐标。根据每个高斯混合分量的联合概率，我们介绍了一种基于样本能量的方法来对图像测试补丁的异常进行评分。采用两流网络框架来组合外观和运动异常，前者使用RGB帧，后者使用动态流图像。我们在两个流行的基准（UCSD数据集和Avenue数据集）上测试了我们的方法。实验结果证明了我们的方法与现有技术相比的优越性。

更新日期：2020-04-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11