Timed-Image Based Deep Learning for Action Recognition in Video Sequences,Pattern Recognition

当前位置： X-MOL 学术 › Pattern Recogn. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Timed-Image Based Deep Learning for Action Recognition in Video Sequences
Pattern Recognition ( IF 7.5 ) Pub Date : 2020-08-01 , DOI: 10.1016/j.patcog.2020.107353
Abdourrahmane Mahamane Atto , Alexandre Benoit , Patrick Lambert

Abstract The paper addresses two issues relative to machine learning on 2D + X data volumes, where 2D refers to image observation and X denotes a variable that can be associated with time, depth, wavelength, etc. The first issue addressed is conditioning these structured volumes for compatibility with respect to convolutional neural networks operating on 2D image file formats. The second issue is associated with sensitive action detection in the “2D + Time” case (video clips and image time series). For the data conditioning issue, the paper first highlights that referring 2D spatial convolution to its 1D Hilbert based instance is highly accurate for information compressibility upon tight frames of convolutional networks. As a consequence of this compressibility, the paper proposes converting the 2D + X data volume into a single meta-image file format, prior to machine learning frameworks. This conversion is such that any 2D frame of the 2D + X data is reshaped as a 1D array indexed by a Hilbert space-filling curve and the third variable X of the initial file format becomes the second variable in the meta-image format. For the sensitive action recognition issue, the paper provides: (i) a 3 category video database involving non-violent, moderate and extreme violence actions; (ii) the conversion of this database into a timed meta-image database from the 2D + Time to 2D conditioning stage described above and (iii) outstanding 2-level and 3-level violence classification results from deep convolutional neural networks operating on meta-image databases.

中文翻译：

用于视频序列动作识别的基于定时图像的深度学习

摘要本文解决了与 2D + X 数据卷上的机器学习相关的两个问题，其中 2D 是指图像观察，X 表示可以与时间、深度、波长等相关联的变量。解决的第一个问题是调节这些结构化的卷与在 2D 图像文件格式上运行的卷积神经网络的兼容性。第二个问题与“2D + 时间”案例（视频剪辑和图像时间序列）中的敏感动作检测有关。对于数据调节问题，该论文首先强调，将 2D 空间卷积引用到其基于 1D Hilbert 的实例对于卷积网络的紧密框架上的信息可压缩性是高度准确的。由于这种可压缩性，该论文建议在机器学习框架之前将 2D + X 数据量转换为单个元图像文件格式。这种转换使得 2D + X 数据的任何 2D 帧被重塑为由希尔伯特空间填充曲线索引的一维数组，并且初始文件格式的第三个变量 X 成为元图像格式中的第二个变量。对于敏感动作识别问题，论文提供了：(i) 一个包含非暴力、中度和极端暴力动作的 3 类视频数据库；(ii) 将该数据库从上述 2D + Time 到 2D 调节阶段转换为定时元图像数据库，以及 (iii) 在元数据上运行的深度卷积神经网络的出色的 2 级和 3 级暴力分类结果图像数据库。这种转换使得 2D + X 数据的任何 2D 帧被重塑为由希尔伯特空间填充曲线索引的一维数组，并且初始文件格式的第三个变量 X 成为元图像格式中的第二个变量。对于敏感动作识别问题，论文提供了：(i) 一个包含非暴力、中度和极端暴力动作的 3 类视频数据库；(ii) 将该数据库从上述 2D + Time 到 2D 调节阶段转换为定时元图像数据库，以及 (iii) 在元数据上运行的深度卷积神经网络的出色的 2 级和 3 级暴力分类结果图像数据库。这种转换使得 2D + X 数据的任何 2D 帧被重塑为由希尔伯特空间填充曲线索引的一维数组，并且初始文件格式的第三个变量 X 成为元图像格式中的第二个变量。对于敏感动作识别问题，论文提供了：(i) 一个包含非暴力、中度和极端暴力动作的 3 类视频数据库；(ii) 将该数据库从上述 2D + Time 到 2D 调节阶段转换为定时元图像数据库，以及 (iii) 在元数据上运行的深度卷积神经网络的出色的 2 级和 3 级暴力分类结果图像数据库。

更新日期：2020-08-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11