当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning Single-Image 3D Reconstruction by Generative Modelling of Shape, Pose and Shading
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2019-10-16 , DOI: 10.1007/s11263-019-01219-8
Paul Henderson , Vittorio Ferrari

We present a unified framework tackling two problems: class-specific 3D reconstruction from a single image, and generation of new 3D shape samples. These tasks have received considerable attention recently; however, most existing approaches rely on 3D supervision, annotation of 2D images with keypoints or poses, and/or training with multiple views of each object instance. Our framework is very general: it can be trained in similar settings to existing approaches, while also supporting weaker supervision. Importantly, it can be trained purely from 2D images, without pose annotations, and with only a single view per instance. We employ meshes as an output representation, instead of voxels used in most prior work. This allows us to reason over lighting parameters and exploit shading information during training, which previous 2D-supervised methods cannot. Thus, our method can learn to generate and reconstruct concave object classes. We evaluate our approach in various settings, showing that: (i) it learns to disentangle shape from pose and lighting; (ii) using shading in the loss improves performance compared to just silhouettes; (iii) when using a standard single white light, our model outperforms state-of-the-art 2D-supervised methods, both with and without pose supervision, thanks to exploiting shading cues; (iv) performance improves further when using multiple coloured lights, even approaching that of state-of-the-art 3D-supervised methods; (v) shapes produced by our model capture smooth surfaces and fine details better than voxel-based approaches; and (vi) our approach supports concave classes such as bathtubs and sofas, which methods based on silhouettes cannot learn.

中文翻译:

通过形状、姿势和阴影的生成建模学习单图像 3D 重建

我们提出了一个统一的框架来解决两个问题:从单个图像进行特定于类的 3D 重建,以及生成新的 3D 形状样本。这些任务最近受到了相当大的关注;然而,大多数现有方法依赖于 3D 监督、带有关键点或姿势的 2D 图像注释,和/或使用每个对象实例的多个视图进行训练。我们的框架非常通用:它可以在与现有方法类似的设置中进行训练,同时还支持较弱的监督。重要的是,它可以完全从 2D 图像进行训练,没有姿势注释,并且每个实例只有一个视图。我们使用网格作为输出表示,而不是大多数先前工作中使用的体素。这使我们能够在训练期间推理照明参数并利用阴影信息,以前的二维监督方法不能。因此,我们的方法可以学习生成和重建凹对象类。我们在各种设置中评估了我们的方法,表明:(i)它学会了从姿势和光照中解开形状;(ii) 与仅使用轮廓相比,在损失中使用阴影可以提高性能;(iii) 当使用标准的单白光时,由于利用了阴影线索,我们的模型在有和没有姿势监督的情况下都优于最先进的 2D 监督方法;(iv) 使用多色光时性能进一步提高,甚至接近最先进的 3D 监督方法;(v) 我们的模型产生的形状比基于体素的方法更好地捕捉光滑的表面和精细的细节;(vi) 我们的方法支持凹类,例如浴缸和沙发,
更新日期:2019-10-16
down
wechat
bug