当前位置: X-MOL 学术Comput. Vis. Image Underst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Decoupled appearance and motion learning for efficient anomaly detection in surveillance video
Computer Vision and Image Understanding ( IF 4.3 ) Pub Date : 2021-07-18 , DOI: 10.1016/j.cviu.2021.103249
Bo Li 1 , Sam Leroux 1 , Pieter Simoens 1
Affiliation  

Automating the analysis of surveillance video footage is of great interest when urban environments or industrial sites are monitored by a large number of cameras. As anomalies are often context-specific, it is hard to predefine events of interest and collect labeled training data. A purely unsupervised approach for automated anomaly detection is much more suitable. For every camera, a separate algorithm could then be deployed that learns over time a baseline model of appearance and motion related features of the objects within the camera viewport. Anything that deviates from this baseline is flagged as an anomaly for further analysis downstream. We propose a new neural network architecture that learns the normal behavior in a purely unsupervised fashion. In contrast to previous work, we use latent code predictions as our anomaly metric. We show that this outperforms frame reconstruction-based and prediction-based methods on different benchmark datasets both in terms of accuracy and robustness against changing lighting and weather conditions. By decoupling an appearance and a motion model, our model can also process 16 to 45 times more frames per second than related approaches which makes our model suitable for deploying on the camera itself or on other edge devices.



中文翻译:

用于监控视频中高效异常检测的解耦外观和运动学习

当城市环境或工业场所由大量摄像机监控时,自动分析监控视频片段非常有趣。由于异常通常是特定于上下文的,因此很难预定义感兴趣的事件并收集标记的训练数据。用于自动异常检测的纯无监督方法更合适。对于每个相机,然后可以部署单独的算法,该算法随着时间的推移学习相机视口内对象的外观和运动相关特征的基线模型。任何偏离此基线的东西都会被标记为异常,以供下游进一步分析。我们提出了一种新的神经网络架构,它以纯无监督的方式学习正常行为。与之前的工作相比,我们使用潜在代码预测作为我们的异常度量。我们表明,这在针对不断变化的光照和天气条件的准确性和鲁棒性方面优于不同基准数据集上基于帧重建和基于预测的方法。通过将外观和运动模型解耦,我们的模型还可以每秒处理比相关方法多 16 到 45 倍的帧,这使得我们的模型适合部署在相机本身或其他边缘设备上。

更新日期:2021-08-01
down
wechat
bug