Human Action Recognition Based on Multi-scale Feature Maps from Depth Video Sequences,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Human Action Recognition Based on Multi-scale Feature Maps from Depth Video Sequences
arXiv - CS - Multimedia Pub Date : 2021-01-19 , DOI: arxiv-2101.07618
Chang Li, Qian Huang, Xing Li, Qianhan Wu

Human action recognition is an active research area in computer vision. Although great process has been made, previous methods mostly recognize actions based on depth data at only one scale, and thus they often neglect multi-scale features that provide additional information action recognition in practical application scenarios. In this paper, we present a novel framework focusing on multi-scale motion information to recognize human actions from depth video sequences. We propose a multi-scale feature map called Laplacian pyramid depth motion images(LP-DMI). We employ depth motion images (DMI) as the templates to generate the multi-scale static representation of actions. Then, we caculate LP-DMI to enhance multi-scale dynamic information of motions and reduces redundant static information in human bodies. We further extract the multi-granularity descriptor called LP-DMI-HOG to provide more discriminative features. Finally, we utilize extreme learning machine (ELM) for action classification. The proposed method yeilds the recognition accuracy of 93.41%, 85.12%, 91.94% on public MSRAction3D dataset, UTD-MHAD and DHA dataset. Through extensive experiments, we prove that our method outperforms state-of-the-art benchmarks.

中文翻译：

基于深度视频序列多尺度特征图的人体动作识别

人体动作识别是计算机视觉中一个活跃的研究领域。尽管已经做出了巨大的努力，但是先前的方法大多仅基于深度数据在一个尺度上识别动作，因此它们经常忽略多尺度的功能，这些特征在实际应用场景中提供了额外的信息动作识别。在本文中，我们提出了一种新颖的框架，该框架着重于多尺度运动信息，以从深度视频序列中识别人类动作。我们提出了一种多尺度特征图，称为拉普拉斯金字塔深度运动图像（LP-DMI）。我们采用深度运动图像（DMI）作为模板来生成动作的多尺度静态表示。然后，我们计算LP-DMI以增强运动的多尺度动态信息，并减少人体中的多余静态信息。我们进一步提取称为LP-DMI-HOG的多粒度描述符，以提供更多区分功能。最后，我们利用极限学习机（ELM）进行动作分类。该方法在公共MSRAction3D数据集，UTD-MHAD和DHA数据集上的识别准确率分别为93.41％，85.12％，91.94％。通过广泛的实验，我们证明了我们的方法优于最新的基准测试。

更新日期：2021-01-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文