Learning Dynamic Textures for Neural Rendering of Human Actors,IEEE Transactions on Visualization and Computer Graphics

当前位置： X-MOL 学术 › IEEE Trans. Vis. Comput. Graph. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Learning Dynamic Textures for Neural Rendering of Human Actors
IEEE Transactions on Visualization and Computer Graphics ( IF 5.2 ) Pub Date : 2020-05-26 , DOI: 10.1109/tvcg.2020.2996594
Lingjie Liu , Weipeng Xu , Marc Habermann , Michael Zollhofer , Florian Bernard , Hyeongwoo Kim , Wenping Wang , Christian Theobalt

Synthesizing realistic videos of humans using neural networks has been a popular alternative to the conventional graphics-based rendering pipeline due to its high efficiency. Existing works typically formulate this as an image-to-image translation problem in 2D screen space, which leads to artifacts such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, such as pose-dependent wrinkles in the clothing. In this article, we propose a novel human video synthesis method that approaches these limiting factors by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. More specifically, our method relies on the combination of two convolutional neural networks (CNNs). Given the pose information, the first CNN predicts a dynamic texture map that contains time-coherent high-frequency details, and the second CNN conditions the generation of the final video on the temporally coherent output of the first CNN. We demonstrate several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.

中文翻译：

学习用于人类演员神经渲染的动态纹理

由于其高效率，使用神经网络合成逼真的人类视频已成为传统基于图形的渲染管道的流行替代方案。现有的工作通常将其表述为 2D 屏幕空间中的图像到图像的转换问题，这会导致诸如过度平滑、身体部位缺失和精细细节的时间不稳定性等伪像，例如图像中与姿势相关的皱纹。服装。在本文中，我们提出了一种新颖的人类视频合成方法，该方法通过明确地从人类在 2D 屏幕空间中的嵌入中分离出时间相干精细尺度细节的学习来解决这些限制因素。更具体地说，我们的方法依赖于两个卷积神经网络 (CNN) 的组合。给定姿势信息，第一个 CNN 预测包含时间相干高频细节的动态纹理图，第二个 CNN 在第一个 CNN 的时间相干输出上调节最终视频的生成。我们展示了我们的方法的几种应用，例如人类重演和单目视频的新颖视图合成，我们在质量和数量上都比现有技术有显着改进。

更新日期：2020-05-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>