A data augmentation method for human action recognition using dense joint motion images,Applied Soft Computing

当前位置： X-MOL 学术 › Appl. Soft Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A data augmentation method for human action recognition using dense joint motion images
Applied Soft Computing ( IF 7.2 ) Pub Date : 2020-09-14 , DOI: 10.1016/j.asoc.2020.106713
Leiyue Yao , Wei Yang , Wei Huang

With the development of deep learning and neural network techniques, human action recognition has made great progress in recent years. However, it remains challenging to analyse temporal information and identify human actions with few training samples. In this paper, an effective motion image called a dense joint motion image (DJMI) was proposed to transform an action to an image. Our method was compared with state-of-the-art methods, and its contributions are mainly reflected in three characteristics. First, in contrast to the current classic joint trajectory map (JTM), every pixel of the DJMI is useful and contains essential spatio-temporal information. Thus, the input parameters of the deep neural network (DNN) are reduced by an order of magnitude, and the efficiency of action recognition is improved. Second, each frame of an action video is encoded as an independent slice of the DJMI, which avoids the information loss caused by action trajectory overlap. Third, by using DJMIs, proven algorithms for graphics and images can be used to generate training samples. Compared with the original image, the generated DJMIs contain new and different spatio-temporal information, which enables DNNs to be trained well on very few samples. Our method was evaluated on three benchmark datasets, namely, Florence-3D, UTKinect-Action3D and MSR Action3D. The results showed that our method achieved a recognition speed of 37 fps with competitive accuracy on these datasets. The time efficiency and few-shot learning capability of our method enable it to be used in real-time surveillance.

中文翻译：

一种利用密集关节运动图像进行人体动作识别的数据增强方法

随着深度学习和神经网络技术的发展，近年来，人类动作识别取得了长足的进步。然而，以很少的训练样本来分析时间信息和识别人类行为仍然具有挑战性。在本文中，提出了一种有效的运动图像，称为密集联合运动图像（DJMI），用于将动作转换为图像。我们的方法与最先进的方法进行了比较，其贡献主要体现在三个特征上。首先，与当前的经典联合轨迹图（JTM）相比，DJMI的每个像素都是有用的，并且包含必要的时空信息。因此，深度神经网络（DNN）的输入参数减少了一个数量级，并且提高了动作识别的效率。第二，动作视频的每个帧都被编码为DJMI的独立片段，从而避免了由于动作轨迹重叠而造成的信息丢失。第三，通过使用DJMI，可以使用经过验证的图形和图像算法来生成训练样本。与原始图像相比，生成的DJMI包含新的和不同的时空信息，这使DNN可以在很少的样本上得到很好的训练。我们的方法在三个基准数据集上进行了评估，分别是Florence-3D，UTKinect-Action3D和MSR Action3D。结果表明，我们的方法在这些数据集上实现了37 fps的识别速度，具有竞争性的准确性。我们方法的时间效率和快速学习能力使其可用于实时监视。避免了因动作轨迹重叠而造成的信息丢失。第三，通过使用DJMI，可以使用经过验证的图形和图像算法来生成训练样本。与原始图像相比，生成的DJMI包含新的和不同的时空信息，这使DNN可以在很少的样本上得到很好的训练。我们的方法在三个基准数据集上进行了评估，分别是Florence-3D，UTKinect-Action3D和MSR Action3D。结果表明，我们的方法在这些数据集上实现了37 fps的识别速度，具有竞争性的准确性。我们方法的时间效率和快速学习能力使其可用于实时监视。避免了因动作轨迹重叠而造成的信息丢失。第三，通过使用DJMI，可以使用经过验证的图形和图像算法来生成训练样本。与原始图像相比，生成的DJMI包含新的和不同的时空信息，这使DNN可以在很少的样本上得到很好的训练。我们的方法在三个基准数据集上进行了评估，分别是Florence-3D，UTKinect-Action3D和MSR Action3D。结果表明，我们的方法在这些数据集上实现了37 fps的识别速度，具有竞争性的准确性。我们方法的时间效率和快速学习能力使其可用于实时监视。生成的DJMI包含新的和不同的时空信息，这使DNN可以在很少的样本上得到很好的训练。我们的方法在三个基准数据集上进行了评估，分别是Florence-3D，UTKinect-Action3D和MSR Action3D。结果表明，我们的方法在这些数据集上实现了37 fps的识别速度，并且具有竞争性的准确性。我们方法的时间效率和快速学习能力使其可用于实时监视。生成的DJMI包含新的和不同的时空信息，这使DNN可以在很少的样本上得到很好的训练。我们的方法在三个基准数据集上进行了评估，分别是Florence-3D，UTKinect-Action3D和MSR Action3D。结果表明，我们的方法在这些数据集上实现了37 fps的识别速度，具有竞争性的准确性。我们方法的时间效率和快速学习能力使其可用于实时监视。

更新日期：2020-09-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11