A deep learning approach to building an intelligent video surveillance system,Multimedia Tools and Applications

当前位置： X-MOL 学术 › Multimed. Tools Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A deep learning approach to building an intelligent video surveillance system
Multimedia Tools and Applications ( IF 3.6 ) Pub Date : 2020-10-07 , DOI: 10.1007/s11042-020-09964-6
Jie Xu

Recent advances in the field of object detection and face recognition have made it possible to develop practical video surveillance systems with embedded object detection and face recognition functionalities that are accurate and fast enough for commercial uses. In this paper, we compare some of the latest approaches to object detection and face recognition and provide reasons why they may or may not be amongst the best to be used in video surveillance applications in terms of both accuracy and speed. It is discovered that Faster R-CNN with Inception ResNet V2 is able to achieve some of the best accuracies while maintaining real-time rates. Single Shot Detector (SSD) with MobileNet, on the other hand, is incredibly fast and still accurate enough for most applications. As for face recognition, FaceNet with Multi-task Cascaded Convolutional Networks (MTCNN) achieves higher accuracy than advances such as DeepFace and DeepID2+ while being faster. An end-to-end video surveillance system is also proposed which could be used as a starting point for more complex systems. Various experiments have also been attempted on trained models with observations explained in detail. We finish by discussing video object detection and video salient object detection approaches which could potentially be used as future improvements to the proposed system.

中文翻译：

构建智能视频监控系统的深度学习方法

对象检测和面部识别领域的最新进展使得开发具有嵌入式对象检测和面部识别功能的实用视频监视系统成为可能，该系统的准确性和速度足以用于商业用途。在本文中，我们比较了一些最新的对象检测和面部识别方法，并从准确性和速度两方面提供了为什么它们可能或不可能成为视频监控应用程序中最好的方法的原因。发现具有Inception ResNet V2的Faster R-CNN能够在保持实时速率的同时达到某些最佳精度。另一方面，带有MobileNet的单发检测器（SSD）的速度非常快，而且对于大多数应用而言仍然足够准确。至于人脸识别具有多任务级联卷积网络（MTCNN）的FaceNet可以实现比DeepFace和DeepID2 +等先进技术更高的准确性，同时速度更快。还提出了一种端到端视频监视系统，可以将其用作更复杂系统的起点。还尝试了在经过训练的模型上进行各种实验，并详细解释了观察结果。我们通过讨论视频对象检测和视频显着对象检测方法来结束本研究，这些方法有可能用作拟议系统的未来改进。还尝试了在经过训练的模型上进行各种实验，并详细解释了观察结果。我们通过讨论视频对象检测和视频显着对象检测方法来结束本研究，这些方法有可能用作拟议系统的未来改进。还尝试了在经过训练的模型上进行各种实验，并详细解释了观察结果。我们通过讨论视频对象检测和视频显着对象检测方法来结束本研究，这些方法有可能用作拟议系统的未来改进。

更新日期：2020-10-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>