当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Predict 3D Surfaces of Sculptures from Single and Multiple Views
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2018-10-22 , DOI: 10.1007/s11263-018-1124-0
Olivia Wiles , Andrew Zisserman

The objective of this work is to reconstruct the 3D surfaces of sculptures from one or more images using a view-dependent representation. To this end, we train a network, SiDeNet, to predict the Silhouette and Depth of the surface given a variable number of images; the silhouette is predicted at a different viewpoint from the inputs (e.g. from the side), while the depth is predicted at the viewpoint of the input images. This has three benefits. First, the network learns a representation of shape beyond that of a single viewpoint, as the silhouette forces it to respect the visual hull, and the depth image forces it to predict concavities (which don’t appear on the visual hull). Second, as the network learns about 3D using the proxy tasks of predicting depth and silhouette images, it is not limited by the resolution of the 3D representation. Finally, using a view-dependent representation (e.g. additionally encoding the viewpoint with the input image) improves the network’s generalisability to unseen objects. Additionally, the network is able to handle the input views in a flexible manner. First, it can ingest a different number of views during training and testing, and it is shown that the reconstruction performance improves as additional views are added at test-time. Second, the additional views do not need to be photometrically consistent. The network is trained and evaluated on two synthetic datasets—a realistic sculpture dataset (SketchFab), and ShapeNet. The design of the network is validated by comparing to state of the art methods for a set of tasks. It is shown that (i) passing the input viewpoint (i.e. using a view-dependent representation) improves the network’s generalisability at test time. (ii) Predicting depth/silhouette images allows for higher quality predictions in 2D, as the network is not limited by the chosen latent 3D representation. (iii) On both datasets the method of combining views in a global manner performs better than a local method. Finally, we show that the trained network generalizes to real images, and probe how the network has encoded the latent 3D shape.

中文翻译:

学习从单视图和多视图预测雕塑的 3D 表面

这项工作的目标是使用依赖于视图的表示从一个或多个图像重建雕塑的 3D 表面。为此,我们训练了一个网络 SiDeNet,在给定可变数量的图像的情况下预测表面的轮廓和深度;在与输入不同的视点(例如从侧面)预测轮廓,而在输入图像的视点预测深度。这有三个好处。首先,网络学习超出单个视点的形状表示,因为轮廓迫使它尊重视觉外壳,而深度图像迫使它预测凹度(不会出现在视觉外壳上)。其次,当网络使用预测深度和轮廓图像的代理任务来学习 3D 时,它不受 3D 表示的分辨率的限制。最后,使用依赖于视图的表示(例如用输入图像额外编码视点)提高了网络对看不见的对象的泛化能力。此外,网络能够以灵活的方式处理输入视图。首先,它可以在训练和测试期间摄取不同数量的视图,并且表明随着在测试时添加额外的视图,重建性能会提高。其次,附加视图不需要在光度学上保持一致。该网络在两个合成数据集上进行训练和评估 - 一个逼真的雕塑数据集 (SketchFab) 和 ShapeNet。通过与一组任务的最先进方法进行比较来验证网络的设计。结果表明,(i)传递输入视点(即 使用依赖于视图的表示)提高了网络在测试时的通用性。(ii) 预测深度/轮廓图像允许在 2D 中进行更高质量的预测,因为网络不受所选潜在 3D 表示的限制。(iii) 在两个数据集上,以全局方式组合视图的方法比局部方法表现更好。最后,我们展示了经过训练的网络可以推广到真实图像,并探讨网络如何编码潜在的 3D 形状。
更新日期:2018-10-22
down
wechat
bug