当前位置: X-MOL 学术arXiv.cs.GR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A-NeRF: Surface-free Human 3D Pose Refinement via Neural Rendering
arXiv - CS - Graphics Pub Date : 2021-02-11 , DOI: arxiv-2102.06199
Shih-Yang Su, Frank Yu, Michael Zollhoefer, Helge Rhodin

While deep learning has reshaped the classical motion capture pipeline, generative, analysis-by-synthesis elements are still in use to recover fine details if a high-quality 3D model of the user is available. Unfortunately, obtaining such a model for every user a priori is challenging, time-consuming, and limits the application scenarios. We propose a novel test-time optimization approach for monocular motion capture that learns a volumetric body model of the user in a self-supervised manner. To this end, our approach combines the advantages of neural radiance fields with an articulated skeleton representation. Our proposed skeleton embedding serves as a common reference that links constraints across time, thereby reducing the number of required camera views from traditionally dozens of calibrated cameras, down to a single uncalibrated one. As a starting point, we employ the output of an off-the-shelf model that predicts the 3D skeleton pose. The volumetric body shape and appearance is then learned from scratch, while jointly refining the initial pose estimate. Our approach is self-supervised and does not require any additional ground truth labels for appearance, pose, or 3D shape. We demonstrate that our novel combination of a discriminative pose estimation technique with surface-free analysis-by-synthesis outperforms purely discriminative monocular pose estimation approaches and generalizes well to multiple views.

中文翻译:

A-NeRF:通过神经渲染完善无表面人类3D姿势

尽管深度学习已重塑了经典的运动捕捉管道,但如果可以提供用户的高质量3D模型,则生成,合成分析元素仍将用于恢复精细的细节。不幸的是,先验地为每个用户获得这样的模型是具有挑战性的,费时的并且限制了应用场景。我们提出了一种用于单眼运动捕获的新颖的测试时间优化方法,该方法以自我监督的方式学习用户的体积身体模型。为此,我们的方法将神经辐射场的优势与铰接的骨架表示相结合。我们建议的骨架嵌入可作为跨时间链接约束的通用参考,从而将所需的摄像机视图数量从传统的几十个经过校准的摄像机减少到单个未经校准的摄像机。首先,我们采用可预测3D骨架姿态的现成模型的输出。然后,从头开始学习体积的身体形状和外观,同时共同完善初始姿势估计。我们的方法是自我监督的,不需要任何其他外观,姿势或3D形状的地面真相标签。我们证明了我们区别对待的姿势估计技术与无表面合成分析的新颖组合胜过纯粹区别对待的单眼姿势估计方法,并且很好地概括了多种观点。我们的方法是自我监督的,不需要任何其他外观,姿势或3D形状的地面真相标签。我们证明了我们区别对待的姿势估计技术与无表面合成分析的新颖组合胜过纯粹区别对待的单眼姿势估计方法,并且很好地概括了多种观点。我们的方法是自我监督的,不需要任何其他外观,姿势或3D形状的地面真相标签。我们证明了我们区别对待的姿势估计技术与无表面合成分析的新颖组合胜过纯粹区别对待的单眼姿势估计方法,并且很好地概括了多种观点。
更新日期:2021-02-12
down
wechat
bug