Human action recognition based on multi-scale feature maps from depth video sequences,Multimedia Tools and Applications

当前位置： X-MOL 学术 › Multimed. Tools Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Human action recognition based on multi-scale feature maps from depth video sequences
Multimedia Tools and Applications ( IF 3.6 ) Pub Date : 2021-07-24 , DOI: 10.1007/s11042-021-11193-4
Chang Li ₁ , Qian Huang ₁ , Xing Li ₁ , Qianhan Wu ₁

Affiliation

Human action recognition is an active research area in computer vision. Although great progress has been made, previous methods mostly recognize actions from depth video sequences at only one scale, and thus they often neglect multi-scale spatial changes that provide additional information in practical applications. In this paper, we present a novel framework with a multi-scale mechanism to improve scale diversity of motion features. We propose a multi-scale feature map called Laplacian pyramid depth motion images(LP-DMI). First, We employ depth motion images (DMI) as the templates to generate the multi-scale static representation of actions. Then, we caculate LP-DMI to enhance multi-scale dynamic information of motions and reduce redundant static information in human bodies. We further extract the multi-granularity descriptor called LP-DMI-HOG to provide more discriminative features. Finally, we utilize extreme learning machine (ELM) for action classification. The proposed method yeilds the recognition accuracy of 93.41%, 85.12%, 91.94% on the public MSRAction3D, UTD-MHAD and DHA dataset. Through extensive experiments, we prove that our method outperforms the state-of-the-art benchmarks.

中文翻译：

基于深度视频序列多尺度特征图的人体动作识别

人体动作识别是计算机视觉中一个活跃的研究领域。尽管已经取得了很大的进步，但以前的方法大多只在一个尺度上从深度视频序列中识别动作，因此他们往往忽略了在实际应用中提供额外信息的多尺度空间变化。在本文中，我们提出了一种具有多尺度机制的新颖框架，以提高运动特征的尺度多样性。我们提出了一种称为拉普拉斯金字塔深度运动图像（LP-DMI）的多尺度特征图。首先，我们采用深度运动图像 (DMI) 作为模板来生成动作的多尺度静态表示。然后，我们计算LP-DMI以增强运动的多尺度动态信息并减少人体中的冗余静态信息。我们进一步提取了称为 LP-DMI-HOG 的多粒度描述符，以提供更具辨别力的特征。最后，我们利用极限学习机（ELM）进行动作分类。该方法在公开的 MSRAction3D、UTD-MHAD 和 DHA 数据集上的识别准确率分别为 93.41%、85.12%、91.94%。通过广泛的实验，我们证明我们的方法优于最先进的基准。

更新日期：2021-07-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>