Deep multi-level feature pyramids: Application for non-canonical firearm detection in video surveillance,Engineering Applications of Artificial Intelligence

当前位置： X-MOL 学术 › Eng. Appl. Artif. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep multi-level feature pyramids: Application for non-canonical firearm detection in video surveillance
Engineering Applications of Artificial Intelligence ( IF 7.5 ) Pub Date : 2020-11-23 , DOI: 10.1016/j.engappai.2020.104094
JunYi Lim , Md Istiaque Al Jobayer , Vishnu Monn Baskaran , Joanne MunYee Lim , John See , KokSheik Wong

The epidemic of gun violence worldwide necessitates the need for an active-based video surveillance network to combat this crime. In this context, autonomously detecting handguns is crucial in capturing firearm-related crimes. However, current object detectors using deep learning are unable to capture handguns at different scales in an unconstrained environment. Hence, this paper puts forward an enhanced deep multi-level feature pyramid network that addresses the difficulty in inferring handguns from a non-canonical perspective. We first construct a dataset containing handguns in an unconstrained environment for representation learning. The dataset is constructed from a set of 250 recorded videos and with over 2500 distinct labeled frames. Crucially, these labeled frames account for the absence of a proper video surveillance-based handgun dataset. We then train the dataset on a multi-level multi-scale object detector, i.e., M2Det. We further improve the performance of M2Det by: (1) Enhancing the base features by concatenating shallow, medium and deep features from the backbone according to its relative receptive field; (2) Implementing generalized intersection-over-union as its localization loss; and (3) Integrating Focal Loss as its classification loss to improve detection of small-scale handguns. Experiments on a challenging video surveillance test dataset demonstrate that the proposed model achieves 87.42% accuracy. In addition, we implement adaptive surveillance image partitioning to redetect handguns at specific regions. This method potentially solves the challenge of sporadically poor real-world handgun classifications. This model is capable of pioneering non-canonical handgun detection for active-based video surveillance systems. The dataset and trained models are available at $:$ https://github.com/MarcusLimJunYi/Monash-Guns-Dataset.

中文翻译：

多层特征金字塔：用于视频监控中的非经典枪支检测

全世界枪支暴力的流行使得有必要建立一个基于主动的视频监视网络来打击这种犯罪。在这种情况下，自主检测手枪对于抓获与枪支有关的犯罪至关重要。但是，当前使用深度学习的物体检测器无法在不受限制的环境中捕获不同规模的手枪。因此，本文提出了一种增强的深度多层特征金字塔网络，从非规范的角度解决了推断手枪的困难。我们首先在不受限制的环境中构建一个包含手枪的数据集，以进行表示学习。该数据集由250个录制的视频集和2500多个不同的标记帧构成。至关重要的是，这些标记的帧说明了缺少适当的基于视频监视的手枪数据集。然后，我们在多级多尺度对象检测器（即M2Det）上训练数据集。我们通过以下方法进一步改善M2Det的性能：（1）通过根据骨干的相对接受场将主干中的浅，中和深特征连接起来来增强基本特征；（2）实施广义交会作为其定位损失；（3）将聚焦损失作为分类损失，以改善对小型手枪的检测。在具有挑战性的视频监控测试数据集上进行的实验表明，该模型可达到87.42％的准确性。此外，我们实施了自适应监视图像分区，以重新检测特定区域的手枪。这种方法潜在地解决了偶发性较差的现实世界手枪分类带来的挑战。此模型能够开创基于主动的视频监视系统的非规范手枪检测。数据集和训练有素的模型可在以下位置获得 $：$ https://github.com/MarcusLimJunYi/Monash-Guns-Dataset。

更新日期：2020-11-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11