当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2021-11-16 , DOI: 10.1007/s11263-021-01529-w
Ting Liu 1 , Long Zhao 1, 2 , Jiaping Zhao 1 , Liangzhe Yuan 1 , Yuxiao Wang 1 , Liang-Chieh Chen 1 , Florian Schroff 1 , Hartwig Adam 1 , Jennifer J. Sun 3
Affiliation  

Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people. However, cameras generally capture human poses in 2D as images and videos, which can have significant appearance variations across viewpoints that make the recognition tasks challenging. To address this, we explore recognizing similarity in 3D human body poses from 2D information, which has not been well-studied in existing works. Here, we propose an approach to learning a compact view-invariant embedding space from 2D body joint keypoints, without explicitly predicting 3D poses. Input ambiguities of 2D poses from projection and occlusion are difficult to represent through a deterministic mapping, and therefore we adopt a probabilistic formulation for our embedding space. Experimental results show that our embedding model achieves higher accuracy when retrieving similar poses across different camera views, in comparison with 3D pose estimation models. We also show that by training a simple temporal embedding model, we achieve superior performance on pose sequence retrieval and largely reduce the embedding dimension from stacking frame-based embeddings for efficient large-scale retrieval. Furthermore, in order to enable our embeddings to work with partially visible input, we further investigate different keypoint occlusion augmentation strategies during training. We demonstrate that these occlusion augmentations significantly improve retrieval performance on partial 2D input poses. Results on action recognition and video alignment demonstrate that using our embeddings without any additional training achieves competitive performance relative to other models specifically trained for each task.



中文翻译:

用于人体姿势的视图不变、遮挡稳健的概率嵌入

识别人体姿势和动作对于自主系统与人顺利交互至关重要。然而,相机通常将 2D 中的人体姿势捕捉为图像和视频,这些姿势在不同视点之间可能具有显着的外观变化,这使得识别任务具有挑战性。为了解决这个问题,我们探索了从 2D 信息中识别 3D 人体姿势的相似性,这在现有工作中尚未得到充分研究。在这里,我们提出了一种从 2D 身体关节关键点学习紧凑的视图不变嵌入空间的方法,而无需明确预测 3D 姿势。来自投影和遮挡的 2D 姿势的输入模糊性很难通过确定性映射来表示,因此我们对嵌入空间采用概率公式。实验结果表明,与 3D 姿势估计模型相比,我们的嵌入模型在跨不同相机视图检索相似姿势时实现了更高的准确性。我们还表明,通过训练一个简单的时间嵌入模型,我们在姿势序列检索方面取得了卓越的性能,并大大减少了堆叠基于帧的嵌入的嵌入维度,以实现高效的大规模检索。此外,为了使我们的嵌入能够处理部分可见的输入,我们在训练期间进一步研究了不同的关键点遮挡增强策略。我们证明这些遮挡增强显着提高了部分 2D 输入姿势的检索性能。

更新日期:2021-11-17
down
wechat
bug