当前位置: X-MOL 学术Arab. J. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mobile Neural Architecture Search Network and Convolutional Long Short-Term Memory-Based Deep Features Toward Detecting Violence from Video
Arabian Journal for Science and Engineering ( IF 2.9 ) Pub Date : 2021-04-08 , DOI: 10.1007/s13369-021-05589-5
Heyam M. Bin Jahlan , Lamiaa A. Elrefaei

Recently, surveillance cameras are deployed in many public places to monitor human activities. Detecting violence in videos through automatic analysis means significant for law enforcement. But almost many monitoring systems require to manually identify violent scenes in the video which leads to slow response. However, violence detection is a challenging problem because of the broad definition of violence. In this work, we will concern with physical violence that involved two persons or more. This work proposed a novel method to detect violence using automated mobile neural architecture search network and convolution long short-term-memory to extract spatiotemporal features in the video, and then adding two types of pooling layers max and average pooling to capture richer features, standard scaling these features and reducing the dimension using linear discriminative analysis to remove redundant features, and making classifier algorithms working well in low dimension. For classification, we trained and tested various machine learning models which are random forest, support vector machine (SVM), and K-nearest neighbor classifiers. We develop a combined dataset that contains violence and non-violence scenes from public datasets: hockey, movie, and violent flow. The performance of the proposed method is evaluated on a combined dataset in addition to three benchmark datasets, hockey, movie, and violent flow datasets in terms of detection accuracy. The results of our model showed high performance in combined, movie, and violent flow datasets using SVM classifier with accuracies of 97.5%, 100%, and 96%, respectively, whereas in the hockey dataset, we achieve the best result of 99.3% using the random forest classifier.



中文翻译:

移动神经体系结构搜索网络和基于卷积长短期记忆的深度特征,可从视频中检测暴力

最近,监视摄像机被部署在许多公共场所以监视人类活动。通过自动分析检测视频中的暴力行为对于执法部门而言意义重大。但是,几乎许多监视系统都需要手动识别视频中的暴力场景,这会导致响应速度变慢。但是,由于暴力的定义很广泛,因此暴力检测是一个具有挑战性的问题。在这项工作中,我们将关注涉及两个或更多人的人身暴力。这项工作提出了一种使用自动移动神经网络架构搜索网络和卷积长短期内存来提取视频中时空特征的暴力检测方法,然后添加了两种类型的池化层:最大和平均池化以捕获更丰富的特征,标准地缩放这些特征并使用线性判别分析来减少维度,以去除多余的特征,并使分类器算法在低维度上表现良好。对于分类,我们训练和测试了各种机器学习模型,包括随机森林,支持向量机(SVM)和K近邻分类器。我们开发了一个合并的数据集,其中包含来自公共数据集的暴力和非暴力场景:曲棍球,电影和暴力流。就检测准确性而言,除了三个基准数据集(曲棍球,电影和暴力流数据集)之外,还对组合数据集评估了所提出方法的性能。我们的模型结果显示,使用SVM分类器在组合流,电影流和剧烈流数据集中的高性能分别达到97.5%,100%和96%的准确度,

更新日期:2021-04-08
down
wechat
bug