High-Fidelity Monocular Face Reconstruction Based on an Unsupervised Model-Based Face Autoencoder,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

High-Fidelity Monocular Face Reconstruction Based on an Unsupervised Model-Based Face Autoencoder
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 10-18-2018 , DOI: 10.1109/tpami.2018.2876842
Ayush Tewari , Michael Zollhofer , Florian Bernard , Pablo Garrido , Hyeongwoo Kim , Patrick Perez , Christian Theobalt

In this work, we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is the differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance, and scene illumination. Due to this new way of combining CNN-based with model-based face reconstruction, the CNN-based encoder learns to extract semantically meaningful parameters from a single monocular input image. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world datasets feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation. This work is an extended version of [1] , where we additionally present a stochastic vertex sampling technique for faster training of our networks, and moreover, we propose and evaluate analysis-by-synthesis and shape-from-shading refinement approaches to achieve a high-fidelity reconstruction.

中文翻译：

基于无监督模型人脸自动编码器的高保真单目人脸重建

在这项工作中，我们提出了一种基于模型的新型深度卷积自动编码器，它解决了从单个野外彩色图像重建 3D 人脸的极具挑战性的问题。为此，我们将卷积编码器网络与专家设计的用作解码器的生成模型结合起来。核心创新是可微参数解码器，它基于生成模型分析地封装图像形成。我们的解码器将具有精确定义语义的代码向量作为输入，该代码向量对详细的面部姿势、形状、表情、皮肤反射率和场景照明进行编码。由于这种将基于 CNN 和基于模型的人脸重建相结合的新方法，基于 CNN 的编码器学会了从单个单眼输入图像中提取语义上有意义的参数。 CNN 编码器和专家设计的生成模型第一次可以以无监督的方式进行端到端训练，这使得对非常大（未标记）的现实世界数据集的训练变得可行。所获得的重建在质量和表示丰富性方面优于当前最先进的方法。这项工作是 [1] 的扩展版本，其中我们还提出了一种随机顶点采样技术，用于更快地训练我们的网络，此外，我们提出并评估综合分析和阴影形状细化方法，以实现高保真重建。

更新日期：2024-08-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11