当前位置: X-MOL 学术Cogn. Syst. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Building a Controllable Expressive Speech Synthesis System with Multiple Emotion Strengths
Cognitive Systems Research ( IF 3.9 ) Pub Date : 2020-01-01 , DOI: 10.1016/j.cogsys.2019.09.009
Xiaolian Zhu , Liumeng Xue

Abstract Emotion is considered to be an essential element in the performance of human-computer interactions. In expressive synthesis speech, it is important to generate emotional speech that reflects subtle and complex emotional states. However, there has been limited research on how to effectively synthesize emotional speech using different levels of emotion strength with intuitive control, which is difficult to be modeled effectively. In this paper, we explore an expressive speech synthesis model that can be used to produce speech with multiple emotion strengths. Unlike previous studies that encoded emotions into discrete codes, we propose an embedding vector to continuously control the emotion strength, which is a data-driven method to synthesize speech with a fine control over the emotions. Compared with the models using the retraining technique or a one-hot vector, our proposed model using an embedding vector can explicitly learn the high-level emotion strength from the low-level acoustic features. As a result, we can control the emotion strength of synthetic speech in a relatively predictable and globally consistent way. The objective and subjective evaluation tests show that our proposed model achieves state-of-the-art performance in terms of model flexibility and controllability.

中文翻译:

构建具有多种情感强度的可控表达语音合成系统

摘要 情感被认为是人机交互性能的基本要素。在表达性合成语音中,重要的是生成反映微妙和复杂情绪状态的情感语音。然而,关于如何通过直观控制使用不同级别的情感强度来有效合成情感语音的研究有限,难以有效建模。在本文中,我们探索了一种表达性语音合成模型,该模型可用于生成具有多种情感强度的语音。与之前将情绪编码为离散代码的研究不同,我们提出了一个嵌入向量来持续控制情绪强度,这是一种数据驱动的方法,可以通过对情绪的精细控制来合成语音。与使用再训练技术或 one-hot 向量的模型相比,我们提出的使用嵌入向量的模型可以从低级声学特征中明确地学习高级情感强度。因此,我们可以以相对可预测且全局一致的方式控制合成语音的情感强度。客观和主观评估测试表明,我们提出的模型在模型灵活性和可控性方面达到了最先进的性能。
更新日期:2020-01-01
down
wechat
bug