当前位置: X-MOL 学术J. Visual Commun. Image Represent. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Harnessing high-level concepts, visual, and auditory features for violence detection in videos
Journal of Visual Communication and Image Representation ( IF 2.6 ) Pub Date : 2021-06-03 , DOI: 10.1016/j.jvcir.2021.103174
Bruno M. Peixoto , Bahram Lavi , Zanoni Dias , Anderson Rocha

In detecting sensitive media, violence is one of the hardest to define objectively, and thus, a significant challenge to detect automatically. While many studies were conducted in detecting aspects of violence, very few try to approach the general concept. We propose a method that aims to enable machines to understand a high-level concept of violence by first breaking it down into smaller, more objective ones, such as fights, explosions, blood, and gunshots, to combine them later, leading to a better understanding of the scene. For this, we leverage characteristics of each individual sub-concept of violence (relying upon custom-tailored convolutional neural networks) to guide how they should be described. A fight scene should incorporate temporal features that a scene with blood does not need to describe. A scene with explosions or gunshots should weigh more on its audio features. With this multimodal approach, we trained visual and auditory feature detectors and later combined them into a decision neural network to give us a violence detector that considers several different aspects of the problem. This robust and modular approach allows different cultures and users to adapt the detector to their specific needs.



中文翻译:

利用高级概念、视觉和听觉特征来检测视频中的暴力

在检测敏感媒体时,暴力是最难客观定义的一种,因此是自动检测的重大挑战。虽然许多研究是在检测暴力的各个方面进行的,但很少有人试图接近一般概念。我们提出了一种方法,旨在使机器能够理解暴力的高级概念,首先将其分解为更小、更客观的概念,例如打斗、爆炸、血腥和枪声,然后再将它们组合起来,从而产生更好的效果。对场景的理解。为此,我们利用每个单独的暴力子概念的特征(依靠定制的卷积神经网络)来指导如何描述它们。打斗场景应该包含血腥场景不需要描述的时间特征。有爆炸或枪声的场景应该更注重其音频功能。通过这种多模式方法,我们训练了视觉和听觉特征检测器,然后将它们组合成一个决策神经网络,为我们提供了一个暴力检测器,该检测器考虑了问题的几个不同方面。这种稳健和模块化的方法允许不同的文化和用户根据他们的特定需求调整检测器。

更新日期:2021-06-07
down
wechat
bug