Facial expression GAN for voice-driven face generation,The Visual Computer

当前位置： X-MOL 学术 › Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Facial expression GAN for voice-driven face generation
The Visual Computer ( IF 3.0 ) Pub Date : 2021-02-22 , DOI: 10.1007/s00371-021-02074-w
Zheng Fang , Zhen Liu , Tingting Liu , Chih-Chieh Hung , Jiangjian Xiao , Guangjin Feng

Cross-modal audiovisual generation is an emerging topic in machine learning. In particular, voice-to-face is one of the most popular research branches, which aims to generate faces from human voice clips. Most recent works in voice-to-face generation do not take emotion information into account. However, it could be widely observed that expressions are the key face attributes to reconstruct sharper and more discriminative faces. In this paper, we propose a novel facial expression GAN (FE-GAN) which takes emotion and expressions into account in face generation. To achieve this goal, we use two auxiliary classifiers to learn more emotion and identity representations between different modalities, respectively. Moreover, we design two discriminators, each focusing on a different aspect of the faces, to measure identity and emotion semantic relevance in generating. The triple loss is designed to make FE-GAN robust to voice variety and keep balance in two different modalities. Extensive experiments are conducted on two real datasets to demonstrate the effectiveness of FE-GAN in both quantitative and qualitative perspectives. The experimental results show that FE-GAN can not only outperform the previous models in terms of FID and IS values, but also generate more realistic face images compared with previous models.

中文翻译：

面部表情GAN用于语音驱动的面部生成

跨模式视听生成是机器学习中的一个新兴主题。特别地，面对面语音是最受欢迎的研究分支之一，其目的是从人类语音片段生成面孔。语音面对面生成中的最新作品没有考虑到情感信息。但是，可以广泛观察到，表情是重构更清晰，更具区分性的面孔的关键面孔属性。在本文中，我们提出了一种新颖的面部表情GAN（FE-GAN），该表情在表情生成过程中考虑了情感和表情。为了实现这一目标，我们使用两个辅助分类器分别学习不同模态之间的更多情感和身份表示。此外，我们设计了两个鉴别器，每个鉴别器着重于面孔的不同方面，在生成中测量身份和情感语义相关性。三重丢失旨在使FE-GAN具有强大的语音多样性，并在两种不同的方式中保持平衡。在两个真实的数据集上进行了广泛的实验，以证明FE-GAN在定量和定性方面的有效性。实验结果表明，FE-GAN不仅在FID和IS值方面优于以前的模型，而且与以前的模型相比还可以生成更逼真的面部图像。

更新日期：2021-02-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文