Deep anomaly detection through visual attention in surveillance videos,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep anomaly detection through visual attention in surveillance videos
Journal of Big Data ( IF 8.1 ) Pub Date : 2020-10-16 , DOI: 10.1186/s40537-020-00365-y
Nasaruddin Nasaruddin , Kahlil Muchtar , Afdhal Afdhal , Alvin Prayuda Juniarta Dwiyantoro

This paper describes a method for learning anomaly behavior in the video by finding an attention region from spatiotemporal information, in contrast to the full-frame learning. In our proposed method, a robust background subtraction (BG) for extracting motion, indicating the location of attention regions is employed. The resulting regions are finally fed into a three-dimensional Convolutional Neural Network (3D CNN). Specifically, by taking advantage of C3D (Convolution 3-dimensional), to completely exploit spatiotemporal relation, a deep convolution network is developed to distinguish normal and anomalous events. Our system is trained and tested against a large-scale UCF-Crime anomaly dataset for validating its effectiveness. This dataset contains 1900 long and untrimmed real-world surveillance videos and splits into 950 anomaly events and 950 normal events, respectively. In total, there are approximately ~ 13 million frames are learned during the training and testing phase. As shown in the experiments section, in terms of accuracy, the proposed visual attention model can obtain 99.25 accuracies. From the industrial application point of view, the extraction of this attention region can assist the security officer on focusing on the corresponding anomaly region, instead of a wider, full-framed inspection.

中文翻译：

通过监视视频中的视觉注意力进行深度异常检测

与全帧学习相反，本文介绍了一种通过从时空信息中找到关注区域来学习视频异常行为的方法。在我们提出的方法中，采用了鲁棒的背景减法（BG）来提取运动，以指示关注区域的位置。最后将生成的区域馈入三维卷积神经网络（3D CNN）。具体来说，通过利用C3D（卷积3维）来完全利用时空关系，开发了一个深度卷积网络来区分正常事件和异常事件。我们的系统已针对大规模UCF-Crime异常数据集进行了培训和测试，以验证其有效性。该数据集包含1900个长时间和未修剪的真实监控视频，并分别分为950个异常事件和950个正常事件。在培训和测试阶段，总共学习了约1300万个框架。如实验部分所示，就准确性而言，所提出的视觉注意力模型可以获得99.25的准确度。从工业应用的角度来看，提取此关注区域可以帮助安全员专注于相应的异常区域，而不是进行更广泛的全帧检查。

更新日期：2020-10-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>