3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head,arXiv - CS - Graphics

当前位置： X-MOL 学术 › arXiv.cs.GR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head
arXiv - CS - Graphics Pub Date : 2021-04-25 , DOI: arxiv-2104.12051
Qianyun Wang, Zhenfeng Fan, Shihong Xia

Impressive progress has been made in audio-driven 3D facial animation recently, but synthesizing 3D talking-head with rich emotion is still unsolved. This is due to the lack of 3D generative models and available 3D emotional dataset with synchronized audios. To address this, we introduce 3D-TalkEmo, a deep neural network that generates 3D talking head animation with various emotions. We also create a large 3D dataset with synchronized audios and videos, rich corpus, as well as various emotion states of different persons with the sophisticated 3D face reconstruction methods. In the emotion generation network, we propose a novel 3D face representation structure - geometry map by classical multi-dimensional scaling analysis. It maps the coordinates of vertices on a 3D face to a canonical image plane, while preserving the vertex-to-vertex geodesic distance metric in a least-square sense. This maintains the adjacency relationship of each vertex and holds the effective convolutional structure for the 3D facial surface. Taking a neutral 3D mesh and a speech signal as inputs, the 3D-TalkEmo is able to generate vivid facial animations. Moreover, it provides access to change the emotion state of the animated speaker. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating the generated talking-heads of significantly higher quality compared to previous state-of-the-art methods.

中文翻译：

3D-TalkEmo：学习合成3D情感说话的头部

音频驱动的3D面部动画近来取得了令人瞩目的进步，但是合成具有丰富情感的3D对话头的工作仍未解决。这是由于缺少3D生成模型和可用的具有同步音频的3D情感数据集。为了解决这个问题，我们引入了3D-TalkEmo，这是一个深度神经网络，可以生成具有各种情感的3D说话的头部动画。我们还使用复杂的3D人脸重建方法创建了一个大型3D数据集，其中包含同步的音频和视频，丰富的语料库以及不同人的各种情感状态。在情感产生网络中，我们提出了一种新颖的3D人脸表示结构-通过经典的多维比例缩放分析的几何图。它将3D面上顶点的坐标映射到规范图像平面，同时保留最小平方意义上的顶点到顶点的测地距离度量。这保持了每个顶点的邻接关系，并保持了3D面部表面的有效卷积结构。以中立的3D网格和语音信号作为输入，3D-TalkEmo能够生成生动的面部动画。此外，它提供了更改动画发言人的情绪状态的途径。除了用户研究之外，我们还对我们的方法进行了广泛的定量和定性评估，证明与以前的最新方法相比，生成的通话头质量明显更高。以中立的3D网格和语音信号作为输入，3D-TalkEmo能够生成生动的面部动画。此外，它提供了更改动画发言人的情绪状态的途径。除了用户研究之外，我们还对我们的方法进行了广泛的定量和定性评估，证明与以前的最新方法相比，生成的通话头质量明显更高。以中立的3D网格和语音信号作为输入，3D-TalkEmo能够生成生动的面部动画。此外，它提供了更改动画发言人的情绪状态的途径。除了用户研究之外，我们还对我们的方法进行了广泛的定量和定性评估，证明与以前的最新方法相比，生成的通话头质量明显更高。

更新日期：2021-04-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文