当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Digital Einstein Experience: Fast Text-to-Speech for Conversational AI
arXiv - CS - Sound Pub Date : 2021-07-21 , DOI: arxiv-2107.10658
Joanna Rownicka, Kilian Sprenkamp, Antonio Tripiana, Volodymyr Gromoglasov, Timo P Kunz

We describe our approach to create and deliver a custom voice for a conversational AI use-case. More specifically, we provide a voice for a Digital Einstein character, to enable human-computer interaction within the digital conversation experience. To create the voice which fits the context well, we first design a voice character and we produce the recordings which correspond to the desired speech attributes. We then model the voice. Our solution utilizes Fastspeech 2 for log-scaled mel-spectrogram prediction from phonemes and Parallel WaveGAN to generate the waveforms. The system supports a character input and gives a speech waveform at the output. We use a custom dictionary for selected words to ensure their proper pronunciation. Our proposed cloud architecture enables for fast voice delivery, making it possible to talk to the digital version of Albert Einstein in real-time.

中文翻译:

数字爱因斯坦体验:对话式人工智能的快速文本到语音转换

我们描述了我们为对话式 AI 用例创建和交付自定义语音的方法。更具体地说,我们为数字爱因斯坦角色提供语音,以在数字对话体验中实现人机交互。为了创建适合上下文的语音,我们首先设计一个语音字符,然后生成与所需语音属性相对应的录音。然后我们对声音进行建模。我们的解决方案利用 Fastspeech 2 从音素和 Parallel WaveGAN 进行对数缩放梅尔谱图预测来生成波形。该系统支持字符输入并在输出处给出语音波形。我们对选定的单词使用自定义词典,以确保它们的正确发音。我们提议的云架构可实现快速语音传输,
更新日期:2021-07-23
down
wechat
bug