Relightable Neural Video Portrait,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Relightable Neural Video Portrait
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-07-30 , DOI: arxiv-2107.14735
Youjia Wang, Taotao Zhou, Minzhang Li, Teng Xu, Minye Wu, Lan Xu, Jingyi Yu

Photo-realistic facial video portrait reenactment benefits virtual production and numerous VR/AR experiences. The task remains challenging as the portrait should maintain high realism and consistency with the target environment. In this paper, we present a relightable neural video portrait, a simultaneous relighting and reenactment scheme that transfers the head pose and facial expressions from a source actor to a portrait video of a target actor with arbitrary new backgrounds and lighting conditions. Our approach combines 4D reflectance field learning, model-based facial performance capture and target-aware neural rendering. Specifically, we adopt a rendering-to-video translation network to first synthesize high-quality OLAT imagesets and alpha mattes from hybrid facial performance capture results. We then design a semantic-aware facial normalization scheme to enable reliable explicit control as well as a multi-frame multi-task learning strategy to encode content, segmentation and temporal information simultaneously for high-quality reflectance field inference. After training, our approach further enables photo-realistic and controllable video portrait editing of the target performer. Reliable face poses and expression editing is obtained by applying the same hybrid facial capture and normalization scheme to the source video input, while our explicit alpha and OLAT output enable high-quality relit and background editing. With the ability to achieve simultaneous relighting and reenactment, we are able to improve the realism in a variety of virtual production and video rewrite applications.

中文翻译：

Relightable 神经视频人像

逼真的面部视频肖像重演有利于虚拟制作和众多 VR/AR 体验。任务仍然具有挑战性，因为肖像应该保持高度的真实感并与目标环境保持一致。在本文中，我们提出了一种可重新点亮的神经视频肖像，这是一种同时重新点亮和重演的方案，可将源演员的头部姿势和面部表情转换为具有任意新背景和光照条件的目标演员的肖像视频。我们的方法结合了 4D 反射场学习、基于模型的面部表现捕捉和目标感知神经渲染。具体来说，我们采用渲染到视频的翻译网络，首先从混合面部表现捕获结果中合成高质量的 OLAT 图像集和 alpha 遮罩。然后，我们设计了一个语义感知面部归一化方案，以实现可靠的显式控制以及多帧多任务学习策略，以同时编码内容、分割和时间信息，以进行高质量的反射场推理。经过训练，我们的方法进一步实现了目标表演者的照片般逼真和可控的视频肖像编辑。通过将相同的混合面部捕获和归一化方案应用于源视频输入，可以获得可靠的面部姿势和表情编辑，而我们的显式 alpha 和 OLAT 输出可实现高质量的 relit 和背景编辑。凭借实现同步重新照明和重演的能力，我们能够提高各种虚拟制作和视频重写应用程序的真实感。

更新日期：2021-08-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文