A multi-stream CNN for deep violence detection in video sequences using handcrafted features,The Visual Computer

当前位置： X-MOL 学术 › Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A multi-stream CNN for deep violence detection in video sequences using handcrafted features
The Visual Computer ( IF 3.0 ) Pub Date : 2021-07-26 , DOI: 10.1007/s00371-021-02266-4
Seyed Mehdi Mohtavipour ₁ , Mahmoud Saeidi ₂ , Abouzar Arabsorkhi ₂

Affiliation

Intelligent video surveillance systems have been used recently for automatic monitoring of human interactions. Although they play a significant role in reducing security concerns, there are many challenges for distinguishing between normal and abnormal behaviors such as crowded environments and camera viewpoint. In this paper, we propose a novel deep violence detection framework based on the specific features derived from handcrafted methods. These features are related to appearance, speed of movement, and representative image and fed to a convolutional neural network (CNN) as spatial, temporal, and spatiotemporal streams. The spatial stream trained the network with each frame in the video to learn environment patterns. The temporal stream contained three consecutive frames to learn motion patterns of violent behavior with a modified differential magnitude of optical flow. Moreover, in spatio-temporal stream, we introduced a discriminative feature with a novel differential motion energy image to represent violent actions more interpretable. This approach covers different aspects of violent behavior by fusing the results of these streams. The proposed CNN network is trained with violence-labeled and normal-labeled frames of 3 Hockey, Movie, and ViF datasets which comprised both crowded and uncrowded situations. The experimental results showed that the proposed deep violence detection approach outperformed state-of-the-art works in terms of accuracy and processing time.

中文翻译：

使用手工特征进行视频序列深度暴力检测的多流 CNN

智能视频监控系统最近已用于自动监控人机交互。尽管它们在减少安全问题方面发挥着重要作用，但在区分正常和异常行为（例如拥挤的环境和摄像机视点）方面存在许多挑战。在本文中，我们基于源自手工方法的特定特征提出了一种新颖的深度暴力检测框架。这些特征与外观、运动速度和代表性图像相关，并作为空间、时间和时空流馈送到卷积神经网络 (CNN)。空间流用视频中的每一帧训练网络以学习环境模式。时间流包含三个连续的帧，以学习具有修改后的光流差分幅度的暴力行为的运动模式。此外，在时空流中，我们引入了一种具有新颖微分运动能量图像的判别特征，以表示更可解释的暴力行为。这种方法通过融合这些流的结果来涵盖暴力行为的不同方面。提议的 CNN 网络使用 3 个曲棍球、电影和 ViF 数据集的暴力标记和正常标记帧进行训练，这些数据集包括拥挤和不拥挤的情况。实验结果表明，所提出的深度暴力检测方法在准确性和处理时间方面优于最先进的作品。我们引入了一种具有新颖微分运动能量图像的判别特征，以表示更易于解释的暴力行为。这种方法通过融合这些流的结果来涵盖暴力行为的不同方面。提议的 CNN 网络使用 3 个曲棍球、电影和 ViF 数据集的暴力标记和正常标记帧进行训练，这些数据集包括拥挤和不拥挤的情况。实验结果表明，所提出的深度暴力检测方法在准确性和处理时间方面优于最先进的作品。我们引入了一种具有新颖微分运动能量图像的判别特征，以表示更易于解释的暴力行为。这种方法通过融合这些流的结果来涵盖暴力行为的不同方面。提议的 CNN 网络使用 3 个曲棍球、电影和 ViF 数据集的暴力标记和正常标记帧进行训练，这些数据集包括拥挤和不拥挤的情况。实验结果表明，所提出的深度暴力检测方法在准确性和处理时间方面优于最先进的作品。提议的 CNN 网络使用 3 个曲棍球、电影和 ViF 数据集的暴力标记和正常标记帧进行训练，这些数据集包括拥挤和不拥挤的情况。实验结果表明，所提出的深度暴力检测方法在准确性和处理时间方面优于最先进的作品。提议的 CNN 网络使用 3 个曲棍球、电影和 ViF 数据集的暴力标记和正常标记帧进行训练，这些数据集包括拥挤和不拥挤的情况。实验结果表明，所提出的深度暴力检测方法在准确性和处理时间方面优于最先进的作品。

更新日期：2021-07-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文