当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams
arXiv - CS - Multimedia Pub Date : 2020-06-20 , DOI: arxiv-2006.11610
Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng

Generating 3D speech-driven talking head has received more and more attention in recent years. Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input. In this work, we propose a novel approach using phonetic posteriorgrams (PPG). In this way, our method doesn't need hand-crafted features and is more robust to noise compared to recent approaches. Furthermore, our method can support multilingual speech as input by building a universal phoneme space. As far as we know, our model is the first to support multilingual/mixlingual speech as input with convincing results. Objective and subjective experiments have shown that our model can generate high quality animations given speech from unseen languages or speakers and be robust to noise.

中文翻译:

说话人独立和多语言/混合语言语音驱动的说话头生成使用语音后验图

近年来,生成 3D 语音驱动的说话头越来越受到关注。最近的方法主要有以下限制:1)大多数独立于说话者的方法需要手工制作的特征,这些特征设计耗时或不可靠;2)没有令人信服的方法来支持多语言或混合语言的语音作为输入。在这项工作中,我们提出了一种使用语音后验图(PPG)的新方法。通过这种方式,我们的方法不需要手工制作的特征,并且与最近的方法相比对噪声更加鲁棒。此外,我们的方法可以通过构建通用音素空间来支持多语言语音作为输入。据我们所知,我们的模型是第一个支持多语言/混合语言语音作为输入并获得令人信服的结果的模型。
更新日期:2020-06-23
down
wechat
bug