当前位置: X-MOL 学术Mach. Vis. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A generalizable approach for multi-view 3D human pose regression
Machine Vision and Applications ( IF 3.3 ) Pub Date : 2020-10-08 , DOI: 10.1007/s00138-020-01120-2
Abdolrahim Kadkhodamohammadi , Nicolas Padoy

Despite the significant improvement in the performance of monocular pose estimation approaches and their ability to generalize to unseen environments, multi-view approaches are often lagging behind in terms of accuracy and are specific to certain datasets. This is mainly due to the fact that (1) contrary to real-world single-view datasets, multi-view datasets are often captured in controlled environments to collect precise 3D annotations, which do not cover all real-world challenges, and (2) the model parameters are learned for specific camera setups. To alleviate these problems, we propose a two-stage approach to detect and estimate 3D human poses, which separates single-view pose detection from multi-view 3D pose estimation. This separation enables us to utilize each dataset for the right task, i.e. single-view datasets for constructing robust pose detection models and multi-view datasets for constructing precise multi-view 3D regression models. In addition, our 3D regression approach only requires 3D pose data and its projections to the views for building the model, hence removing the need for collecting annotated data from the test setup. Our approach can therefore be easily generalized to a new environment by simply projecting 3D poses into 2D during training according to the camera setup used at test time. As 2D poses are collected at test time using a single-view pose detector, which might generate inaccurate detections, we model its characteristics and incorporate this information during training. We demonstrate that incorporating the detector’s characteristics is important to build a robust 3D regression model and that the resulting regression model generalizes well to new multi-view environments. Our evaluation results show that our approach achieves competitive results on the Human3.6M dataset and significantly improves results on a multi-view clinical dataset that is the first multi-view dataset generated from live surgery recordings.



中文翻译:

多视图3D人体姿势回归的通用方法

尽管单眼姿势估计方法的性能有了显着提高,并且可以将其推广到看不见的环境,但是多视图方法在准确性方面往往滞后,并且特定于某些数据集。这主要是由于以下事实:(1)与现实世界中的单视图数据集相反,多视图数据集通常是在受控环境中捕获的,以收集精确的3D注释,而这些注释并未涵盖所有现实世界中的挑战,并且(2 )学习特定相机设置的模型参数。为了缓解这些问题,我们提出了一种用于检测和估计3D人体姿势的两阶段方法,该方法将单视图姿势检测与多视图3D姿势估计分开。这种分离使我们能够将每个数据集用于正确的任务,即 用于构建鲁棒的姿态检测模型的单视图数据集和用于构建精确的多视图3D回归模型的多视图数据集。此外,我们的3D回归方法仅需要3D姿态数据及其对视图的投影即可构建模型,因此无需从测试设置中收集带注释的数据。因此,通过根据测试时使用的相机设置,在训练过程中将3D姿势简单地投影到2D中即可轻松地将我们的方法推广到新环境中。由于在测试时使用单视图姿势检测器收集了2D姿势,这可能会产生不准确的检测结果,因此我们对它的特征进行建模,并在训练过程中合并此信息。我们证明,结合检测器的特性对于构建鲁棒的3D回归模型非常重要,并且所得的回归模型可以很好地推广到新的多视图环境。我们的评估结果表明,我们的方法在Human3.6M数据集上获得了竞争性结果,并在多视图临床数据集(这是从现场手术记录中生成的第一个多视图数据集)上显着改善了结果。

更新日期:2020-10-11
down
wechat
bug