Multi-view region-adaptive multi-temporal DMM and RGB action recognition,Pattern Analysis and Applications

当前位置： X-MOL 学术 › Pattern Anal. Applic. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-view region-adaptive multi-temporal DMM and RGB action recognition
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2020-04-21 , DOI: 10.1007/s10044-020-00886-5
Mahmoud Al-Faris , John P. Chiverton , Yanyan Yang , David Ndzi

Human action recognition remains an important yet challenging task. This work proposes a novel action recognition system. It uses a novel multi-view region-adaptive multi-resolution-in-time depth motion map (MV-RAMDMM) formulation combined with appearance information. Multi-stream 3D convolutional neural networks (CNNs) are trained on the different views and time resolutions of the region-adaptive depth motion maps. Multiple views are synthesised to enhance the view invariance. The region-adaptive weights, based on localised motion, accentuate and differentiate parts of actions possessing faster motion. Dedicated 3D CNN streams for multi-time resolution appearance information are also included. These help to identify and differentiate between small object interactions. A pre-trained 3D-CNN is used here with fine-tuning for each stream along with multi-class support vector machines. Average score fusion is used on the output. The developed approach is capable of recognising both human action and human–object interaction. Three public-domain data-sets, namely MSR 3D Action, Northwestern UCLA multi-view actions and MSR 3D daily activity, are used to evaluate the proposed solution. The experimental results demonstrate the robustness of this approach compared with state-of-the-art algorithms.

中文翻译：

多视图区域自适应多时间DMM和RGB动作识别

识别人类行为仍然是一项重要而具有挑战性的任务。这项工作提出了一种新颖的动作识别系统。它使用一种新颖的多视图区域自适应多分辨率实时深度运动图（MV-RAMDMM）公式，结合了外观信息。在区域自适应深度运动图的不同视图和时间分辨率上训练了多流3D卷积神经网络（CNN）。合成了多个视图以增强视图不变性。基于局部运动的区域自适应权重会加重和区分具有更快运动的部分动作。还包括用于多时间分辨率外观信息的专用3D CNN流。这些有助于识别和区分小对象交互。此处使用经过预训练的3D-CNN，并对每个流以及多类支持向量机进行微调。平均分数融合用于输出。先进的方法能够识别人类行为和人与物体之间的相互作用。使用三个公共领域数据集（即MSR 3D动作，西北UCLA多视图动作和MSR 3D日常活动）来评估所提出的解决方案。实验结果证明了该方法与最新算法相比的鲁棒性。用于评估建议的解决方案。实验结果证明了该方法与最新算法相比的鲁棒性。用于评估建议的解决方案。实验结果证明了该方法与最新算法相比的鲁棒性。

更新日期：2020-04-21

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>