View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 1-31-2019 , DOI: 10.1109/tpami.2019.2896631
Pengfei Zhang , Cuiling Lan , Junliang Xing , Wenjun Zeng , Jianru Xue , Nanning Zheng

Skeleton-based human action recognition has recently attracted increasing attention thanks to the accessibility and the popularity of 3D skeleton data. One of the key challenges in action recognition lies in the large variations of action representations when they are captured from different viewpoints. In order to alleviate the effects of view variations, this paper introduces a novel view adaptation scheme, which automatically determines the virtual observation viewpoints over the course of an action in a learning based data driven manner. Instead of re-positioning the skeletons using a fixed human-defined prior criterion, we design two view adaptive neural networks, i.e., VA-RNN and VA-CNN, which are respectively built based on the recurrent neural network (RNN) with the Long Short-term Memory (LSTM) and the convolutional neural network (CNN). For each network, a novel view adaptation module learns and determines the most suitable observation viewpoints, and transforms the skeletons to those viewpoints for the end-to-end recognition with a main classification network. Ablation studies find that the proposed view adaptive models are capable of transforming the skeletons of various views to much more consistent virtual viewpoints. Therefore, the models largely eliminate the influence of the viewpoints, enabling the networks to focus on the learning of action-specific features and thus resulting in superior performance. In addition, we design a two-stream scheme (referred to as VA-fusion) that fuses the scores of the two networks to provide the final prediction, obtaining enhanced performance. Moreover, random rotation of skeleton sequences is employed to improve the robustness of view adaptation models and alleviate overfitting during training. Extensive experimental evaluations on five challenging benchmarks demonstrate the effectiveness of the proposed view-adaptive networks and superior performance over state-of-the-art approaches.

中文翻译：

查看用于高性能基于骨骼的人体动作识别的自适应神经网络

由于 3D 骨骼数据的可访问性和普及性，基于骨骼的人体动作识别最近引起了越来越多的关注。动作识别的关键挑战之一在于，当从不同的角度捕获动作表示时，动作表示会存在很大差异。为了减轻视图变化的影响，本文引入了一种新颖的视图自适应方案，该方案以基于学习的数据驱动方式自动确定动作过程中的虚拟观察视点。我们没有使用固定的人类定义的先验标准来重新定位骨架，而是设计了两种视图自适应神经网络，即 VA-RNN 和 VA-CNN，它们分别基于循环神经网络（RNN）和 Long 模型构建。短期记忆 (LSTM) 和卷积神经网络 (CNN)。对于每个网络，新颖的视图适应模块学习并确定最合适的观察视点，并将骨架转换为这些视点，以便通过主分类网络进行端到端识别。消融研究发现，所提出的视图自适应模型能够将各种视图的骨架转换为更加一致的虚拟视点。因此，该模型在很大程度上消除了观点的影响，使网络能够专注于学习特定于动作的特征，从而获得优异的性能。此外，我们设计了一种双流方案（称为 VA-fusion），融合两个网络的分数以提供最终预测，从而获得增强的性能。此外，采用骨架序列的随机旋转来提高视图自适应模型的鲁棒性并减轻训练期间的过度拟合。对五个具有挑战性的基准的广泛实验评估证明了所提出的视图自适应网络的有效性以及优于最先进方法的性能。

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11