Rethinking the ST-GCNs for 3D skeleton-based human action recognition,Neurocomputing

当前位置： X-MOL 学术 › Neurocomputing › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Rethinking the ST-GCNs for 3D skeleton-based human action recognition
Neurocomputing ( IF 5.5 ) Pub Date : 2021-05-06 , DOI: 10.1016/j.neucom.2021.05.004
Wei Peng , Jingang Shi , Tuomas Varanka , Guoying Zhao

The skeletal data has been an alternative for the human action recognition task as it provides more compact and distinct information compared to the traditional RGB input. However, unlike the RGB input, the skeleton data lies in a non-Euclidean space that traditional deep learning methods are not able to use their fullest potential. Fortunately, with the emerging trend of Geometric deep learning, the spatial-temporal graph convolutional network (ST-GCN) has been proposed to deal with the action recognition problem from skeleton data. ST-GCN and its variants fit well with skeleton-based action recognition and are becoming the mainstream frameworks for this task. However, the efficiency and the performance of the task are hindered by either fixing the skeleton joint correlations or providing a computational expensive strategy to construct a dynamic topology for the skeleton. We argue that many of these operations are either unnecessary or even harmful for the task. By theoretically and experimentally analysing the state-of-the-art ST-GCNs, we provide a simple but efficient strategy to capture the global graph correlations and thus efficiently model the representation of the input graph sequences. Moreover, the global graph strategy also reduces the graph sequence into the Euclidean space, thus a multi-scale temporal filter is introduced to efficiently capture the dynamic information. With the method, we are not only able to better extract the graph correlations with much fewer parameters (only 12.6% of the current best), but we also achieve a superior performance. Extensive experiments on current largest 3D datasets, NTU-RGB+D and NTU-RGB+D 120, demonstrate the ability of our network to perform efficient and lightweight priority on this task.

中文翻译：

重新思考ST-GCN，以实现基于3D骨骼的人体动作识别

骨骼数据已成为人类动作识别任务的替代方法，因为与传统的RGB输入相比，骨骼数据提供了更为紧凑和独特的信息。但是，与RGB输入不同，骨架数据位于非欧氏空间中，传统的深度学习方法无法充分利用其潜力。幸运的是，随着几何深度学习的兴起，提出了时空图卷积网络（ST-GCN）来处理骨骼数据中的动作识别问题。ST-GCN及其变体非常适合基于骨骼的动作识别，并且正成为此任务的主流框架。然而，固定骨架关节相关性或提供计算成本高昂的策略来构造骨架的动态拓扑会阻碍任务的效率和性能。我们认为，这些操作中的许多操作对于任务都是不必要的，甚至是有害的。通过理论和实验分析最新的ST-GCN，我们提供了一种简单而有效的策略来捕获全局图相关性，从而有效地对输入图序列的表示进行建模。此外，全局图策略还将图序列减少到欧几里得空间中，因此引入了多尺度时间滤波器来有效地捕获动态信息。使用该方法，我们不仅可以用更少的参数（只有12个）来更好地提取图相关性。是目前最好的6％），但我们也取得了卓越的性能。在当前最大的3D数据集NTU-RGB + D和NTU-RGB + D 120上进行了广泛的实验，证明了我们的网络具有执行此任务的高效轻量级优先级的能力。

更新日期：2021-05-24

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11