Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multiscale Spatio-Temporal Graph Neural Networks for 3D Skeleton-Based Motion Prediction
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2021-09-10 , DOI: 10.1109/tip.2021.3108708
Maosen Li , Siheng Chen , Yangheng Zhao , Ya Zhang , Yanfeng Wang , Qi Tian

We propose a multiscale spatio-temporal graph neural network (MST-GNN) to predict the future 3D skeleton-based human poses in an action-category-agnostic manner. The core of MST-GNN is a multiscale spatio-temporal graph that explicitly models the relations in motions at various spatial and temporal scales. Different from many previous hierarchical structures, our multiscale spatio-temporal graph is built in a data-adaptive fashion , which captures nonphysical, yet motion-based relations. The key module of MST-GNN is a multiscale spatio-temporal graph computational unit (MST-GCU) based on the trainable graph structure. MST-GCU embeds underlying features at individual scales and then fuses features across scales to obtain a comprehensive representation. The overall architecture of MST-GNN follows an encoder-decoder framework, where the encoder consists of a sequence of MST-GCUs to learn the spatial and temporal features of motions, and the decoder uses a graph-based attention gate recurrent unit (GA-GRU) to generate future poses. Extensive experiments are conducted to show that the proposed MST-GNN outperforms state-of-the-art methods in both short and long-term motion prediction on the datasets of Human 3.6M, CMU Mocap and 3DPW, where MST-GNN outperforms previous works by 5.33% and 3.67% of mean angle errors in average for short-term and long-term prediction on Human 3.6M, and by 11.84% and 4.71% of mean angle errors for short-term and long-term prediction on CMU Mocap, and by 1.13% of mean angle errors on 3DPW in average, respectively. We further investigate the learned multiscale graphs for interpretability.

中文翻译：

用于基于 3D 骨架的运动预测的多尺度时空图神经网络

我们提出了一种多尺度时空图神经网络 (MST-GNN)，以与动作类别无关的方式预测未来基于 3D 骨架的人体姿势。MST-GNN 的核心是一个多尺度时空图，它明确地对各种时空尺度上的运动关系进行建模。与以前的许多层次结构不同，我们的多尺度时空图构建在一个数据自适应时尚，捕捉非物理但基于运动的关系。MST-GNN 的关键模块是基于可训练图结构的多尺度时空图计算单元（MST-GCU）。MST-GCU 在单个尺度上嵌入底层特征，然后跨尺度融合特征以获得综合表示。MST-GNN 的整体架构遵循编码器-解码器框架，其中编码器由一系列 MST-GCU 组成以学习运动的空间和时间特征，解码器使用基于图的注意力门循环单元（GA- GRU）来生成未来的姿势。进行了大量实验以表明所提出的 MST-GNN 在 Human 3.6M、CMU Mocap 和 3DPW 数据集的短期和长期运动预测方面均优于最先进的方法，其中，MST-GNN 在 Human 3.6M 上的短期和长期预测的平均角度误差比之前的工作平均高出 5.33% 和 3.67%，在短期和长期的平均角度误差上分别高出 11.84% 和 4.71% CMU Mocap 上的 -term 预测，以及 3DPW 上平均角度误差的 1.13%。我们进一步研究了学习的多尺度图的可解释性。

更新日期：2021-09-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>