当前位置: X-MOL 学术IEEE Trans. Inform. Forensics Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High-Fidelity Face Manipulation With Extreme Poses and Expressions
IEEE Transactions on Information Forensics and Security ( IF 6.8 ) Pub Date : 2021-01-08 , DOI: 10.1109/tifs.2021.3050065
Chaoyou Fu , Yibo Hu , Xiang Wu , Guoli Wang , Qian Zhang , Ran He

Face manipulation has shown remarkable advances with the flourish of Generative Adversarial Networks. However, due to the difficulties of controlling structures and textures, it is challenging to model poses and expressions simultaneously, especially for the extreme manipulation at high-resolution. In this article, we propose a novel framework that simplifies face manipulation into two correlated stages: a boundary prediction stage and a disentangled face synthesis stage. The first stage models poses and expressions jointly via boundary images. Specifically, a conditional encoder-decoder network is employed to predict the boundary image of the target face in a semi-supervised way. Pose and expression estimators are introduced to improve the prediction performance. In the second stage, the predicted boundary image and the input face image are encoded into the structure and the texture latent space by two encoder networks, respectively. A proxy network and a feature threshold loss are further imposed to disentangle the latent space. Furthermore, due to the lack of high-resolution face manipulation databases to verify the effectiveness of our method, we collect a new high-quality Multi-View Face (MVF-HQ) database. It contains 120,283 images at $6000\times 4000$ resolution from 479 identities with diverse poses, expressions, and illuminations. MVF-HQ is much larger in scale and much higher in resolution than publicly available high-resolution face manipulation databases. We will release MVF-HQ soon to push forward the advance of face manipulation. Qualitative and quantitative experiments on four databases show that our method dramatically improves the synthesis quality.

中文翻译:

具有极端姿势和表情的高保真人脸操纵

随着生成对抗网络的兴起,人脸操纵已显示出惊人的进步。但是,由于难以控制结构和纹理,因此同时建模姿势和表情非常困难,特别是对于高分辨率的极端操作而言。在本文中,我们提出了一种新颖的框架,该框架将人脸操作简化为两个相关阶段:边界预测阶段和解开人脸合成阶段。第一阶段模型通过边界图像共同构成姿势和表情。具体地,采用条件编码器-解码器网络以半监督的方式预测目标面部的边界图像。引入姿势和表达估计器以提高预测性能。在第二阶段 预测的边界图像和输入的人脸图像分别通过两个编码器网络编码为结构和纹理潜在空间。进一步强加了代理网络和特征阈值损失以解开潜在空间。此外,由于缺乏高分辨率的面部操作数据库来验证我们方法的有效性,我们收集了一个新的高质量多视图面部(MVF-HQ)数据库。它包含120,283张图像 $ 6000 /次4000 $ 来自479个具有不同姿势,表情和照明的身份的解析。MVF-HQ的规模远大于公开的高分辨率面部操作数据库,并且分辨率更高。我们将尽快发布MVF-HQ,以推动面部操纵的发展。在四个数据库上进行的定性和定量实验表明,我们的方法大大提高了合成质量。
更新日期:2021-02-12
down
wechat
bug