当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Controlled AutoEncoders to Generate Faces from Voices
arXiv - CS - Sound Pub Date : 2021-07-16 , DOI: arxiv-2107.07988
Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh

Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face have to change in order to be perceived as the originator of a source voice?" With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper. Our framework includes a guided autoencoder that converts one face to another, controlled by a unique model-conditioning component called a gating controller which modifies the reconstructed face based on input voice recordings. We evaluate the framework on VoxCelab and VGGFace datasets through human subjects and face retrieval. Various experiments demonstrate the effectiveness of our proposed model.

中文翻译:

受控自动编码器从语音生成人脸

过去的多项研究表明,人类的声音特征与面部特征之间存在很强的相关性。然而,现有的方法只是从语音中生成人脸,而没有探索有助于这些观察到的相关性的特征集。可以通过将问题重新表述为:“目标人脸必须改变多少才能被视为源语音的发起者?”来设计探索这一点的计算方法。考虑到这一点,我们提出了一个框架来响应给定的声音来变形目标面部,其方式是面部特征由本文中学习的语音-面部相关性隐式引导。我们的框架包含一个引导式自动编码器,可将一张脸转换为另一张脸,由称为门控控制器的独特模型调节组件控制,该组件根据输入语音记录修改重建的面部。我们通过人类受试者和面部检索评估 VoxCelab 和 VGGFace 数据集上的框架。各种实验证明了我们提出的模型的有效性。
更新日期:2021-07-19
down
wechat
bug