Synthetic Humans for Action Recognition from Unseen Viewpoints,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Synthetic Humans for Action Recognition from Unseen Viewpoints
International Journal of Computer Vision ( IF 11.6 ) Pub Date : 2021-05-12 , DOI: 10.1007/s11263-021-01467-7
Gül Varol , Ivan Laptev , Cordelia Schmid , Andrew Zisserman

Although synthetic training data has been shown to be beneficial for tasks such as human pose estimation, its use for RGB human action recognition is relatively unexplored. Our goal in this work is to answer the question whether synthetic humans can improve the performance of human action recognition, with a particular focus on generalization to unseen viewpoints. We make use of the recent advances in monocular 3D human body reconstruction from real action sequences to automatically render synthetic training videos for the action labels. We make the following contributions: (1) we investigate the extent of variations and augmentations that are beneficial to improving performance at new viewpoints. We consider changes in body shape and clothing for individuals, as well as more action relevant augmentations such as non-uniform frame sampling, and interpolating between the motion of individuals performing the same action; (2) We introduce a new data generation methodology, SURREACT, that allows training of spatio-temporal CNNs for action classification; (3) We substantially improve the state-of-the-art action recognition performance on the NTU RGB+D and UESTC standard human action multi-view benchmarks; Finally, (4) we extend the augmentation approach to in-the-wild videos from a subset of the Kinetics dataset to investigate the case when only one-shot training data is available, and demonstrate improvements in this case as well.

中文翻译：

从看不见的角度进行动作识别的合成人

尽管已经显示出综合训练数据对于诸如人体姿势估计之类的任务是有益的，但是相对来说，它还没有用于RGB人体动作识别。我们在这项工作中的目标是回答以下问题：人工合成的人是否可以改善人类动作识别的性能，特别着重于对看不见的观点的概括。我们利用单眼3D人体重建的最新进展，从真实动作序列中自动生成动作标签的合成训练视频。我们做出以下贡献：（1）我们研究了在新的观点下有助于提高性能的变化和增强的程度。我们考虑个人身体形状和衣服的变化，以及与动作相关的更多增强，例如非均匀帧采样，以及在执行相同动作的个人运动之间进行插值；（2）我们介绍一种新的数据生成方法，SURREACT，它允许对时空CNN进行训练以进行动作分类；（3）我们大幅提高了NTU RGB + D和UESTC标准人体动作多视点基准的最新动作识别性能；最后，（4）我们将增强方法扩展到Kinetics数据集子集中的野生视频中，以研究仅提供单次训练数据时的情况，并在这种情况下也表现出改进。

更新日期：2021-05-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11