当前位置: X-MOL 学术arXiv.cs.GR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FLEX: Parameter-free Multi-view 3D Human Motion Reconstruction
arXiv - CS - Graphics Pub Date : 2021-05-05 , DOI: arxiv-2105.01937
Brian Gordon, Sigal Raab, Guy Azov, Raja Giryes, Daniel Cohen-Or

The increasing availability of video recordings made by multiple cameras has offered new means for mitigating occlusion and depth ambiguities in pose and motion reconstruction methods. Yet, multi-view algorithms strongly depend on camera parameters, in particular, the relative positions among the cameras. Such dependency becomes a hurdle once shifting to dynamic capture in uncontrolled settings. We introduce FLEX (Free muLti-view rEconstruXion), an end-to-end parameter-free multi-view model. FLEX is parameter-free in the sense that it does not require any camera parameters, neither intrinsic nor extrinsic. Our key idea is that the 3D angles between skeletal parts, as well as bone lengths, are invariant to the camera position. Hence, learning 3D rotations and bone lengths rather than locations allows predicting common values for all camera views. Our network takes multiple video streams, learns fused deep features through a novel multi-view fusion layer, and reconstructs a single consistent skeleton with temporally coherent joint rotations. We demonstrate quantitative and qualitative results on the Human3.6M and KTH Multi-view Football II datasets. We compare our model to state-of-the-art methods that are not parameter-free and show that in the absence of camera parameters, we outperform them by a large margin while obtaining comparable results when camera parameters are available. Code, trained models, video demonstration, and additional materials will be available on our project page.

中文翻译:

FLEX:无参数多视图3D人体运动重构

由多台摄像机进行的视频录制的可用性不断提高,为缓解姿势和运动重建方法中的遮挡和深度模糊性提供了新的手段。然而,多视点算法强烈取决于相机参数,尤其是相机之间的相对位置。一旦转变为不受控制的设置中的动态捕获,这种依赖性就成为障碍。我们介绍了FLEX(免费多视图重构),它是一种端到端的无参数多视图模型。FLEX是不需要参数的,因为它不需要任何相机参数,既不需要内部参数,也可以不需要外部参数。我们的关键思想是骨骼部分之间的3D角度以及骨骼长度对于相机位置是不变的。因此,通过学习3D旋转和骨骼长度而不是位置,可以预测所有摄像机视图的通用值。我们的网络接收多个视频流,通过新颖的多视图融合层学习融合的深度特征,并使用时间相干的关节旋转重建单个一致的骨架。我们在Human3.6M和KTH Multi-view Football II数据集上展示了定量和定性的结果。我们将模型与并非没有参数的最新方法进行了比较,结果表明,在没有摄像机参数的情况下,我们可以大大超越它们,同时在可获得摄像机参数的情况下可以获得可比的结果。代码,训练有素的模型,视频演示和其他材料将在我们的项目页面上提供。我们在Human3.6M和KTH Multi-view Football II数据集上展示了定量和定性的结果。我们将模型与并非没有参数的最新方法进行了比较,结果表明,在没有摄像机参数的情况下,我们可以大大超越它们,同时在可获得摄像机参数的情况下可以获得可比的结果。代码,训练有素的模型,视频演示和其他材料将在我们的项目页面上提供。我们在Human3.6M和KTH Multi-view Football II数据集上展示了定量和定性的结果。我们将模型与并非没有参数的最新方法进行了比较,结果表明,在没有摄像机参数的情况下,我们可以大大超越它们,同时在可获得摄像机参数的情况下可以获得可比的结果。代码,训练有素的模型,视频演示和其他材料将在我们的项目页面上提供。
更新日期:2021-05-06
down
wechat
bug