Fast Learning Through Deep Multi-Net CNN Model For Violence Recognition In Video Surveillance,The Computer Journal

当前位置： X-MOL 学术 › Comput. J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fast Learning Through Deep Multi-Net CNN Model For Violence Recognition In Video Surveillance
The Computer Journal ( IF 1.4 ) Pub Date : 2020-07-06 , DOI: 10.1093/comjnl/bxaa061
Aqib Mumtaz ₁ , Allah Bux Sargano ₁ , Zulfiqar Habib ₁

Affiliation

The violence detection is mostly achieved through handcrafted feature descriptors, while some researchers have also employed deep learning-based representation models for violent activity recognition. Deep learning-based models have achieved encouraging results for fight activity recognition on benchmark data sets such as hockey and movies. However, these models have limitations in learning discriminating features for violence activity classification with abrupt camera motion. This research work investigated deep representation models using transfer learning for handling the issue of abrupt camera motion. Consequently, a novel deep multi-net (DMN) architecture based on AlexNet and GoogleNet is proposed for violence detection in videos. AlexNet and GoogleNet are top-ranked pre-trained models for image classification with distinct pre-learnt potential features. The fusion of these models can yield superior performance. The proposed DMN unleashed the integrated potential by concurrently coalescing both networks. The results confirmed that DMN outperformed state-of-the-art methods by learning finest discriminating features and achieved 99.82% and 100% accuracy on hockey and movies data sets, respectively. Moreover, DMN has faster learning capability i.e. 1.33 and 2.28 times faster than AlexNet and GoogleNet, which makes it an effective learning architecture on images and videos.

中文翻译：

通过深度多网CNN模型快速学习视频监视中的暴力识别

暴力检测主要通过手工制作的特征描述符实现，而一些研究人员还采用了基于深度学习的表示模型来进行暴力活动识别。基于深度学习的模型在曲棍球和电影等基准数据集的战斗活动识别方面取得了令人鼓舞的结果。但是，这些模型在学习针对带有突然摄像机运动的暴力活动分类的区分特征方面存在局限性。这项研究工作调查了深度表示模型，这些模型使用转移学习来处理突然的相机运动问题。因此，提出了一种基于AlexNet和GoogleNet的新颖的深层多网（DMN）架构，用于视频中的暴力检测。AlexNet和GoogleNet是用于图像分类的顶级预训练模型，具有独特的预学习潜在功能。这些模型的融合可以产生出众的性能。拟议的DMN通过同时合并两个网络释放了综合潜力。结果证实，DMN通过学习最好的区分功能胜过了最新技术，在曲棍球和电影数据集上的准确率分别达到了99.82％和100％。而且，DMN具有更快的学习能力，即比AlexNet和GoogleNet的学习能力快1.33和2.28倍，这使其成为一种有效的图像和视频学习体系。结果证实，DMN通过学习最好的区分功能胜过了最新技术，在曲棍球和电影数据集上的准确率分别达到了99.82％和100％。而且，DMN具有更快的学习能力，即比AlexNet和GoogleNet的学习能力快1.33和2.28倍，这使其成为一种有效的图像和视频学习体系。结果证实，DMN通过学习最好的区分功能胜过了最新技术，在曲棍球和电影数据集上的准确率分别达到了99.82％和100％。而且，DMN具有更快的学习能力，即比AlexNet和GoogleNet快1.33和2.28倍，这使其成为一种有效的图像和视频学习体系。

更新日期：2020-07-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>