Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2020-01-01 , DOI: 10.1109/lsp.2020.3016564
Rui Liu , Berrak Sisman , Feilong Bao , Guanglai Gao , Haizhou Li

Tacotron-based end-to-end speech synthesis has shown remarkable voice quality. However, the rendering of prosody in the synthesized speech remains to be improved, especially for long sentences, where prosodic phrasing errors can occur frequently. In this letter, we extend the Tacotron-based speech synthesis framework to explicitly model the prosodic phrase breaks. We propose a multi-task learning scheme for Tacotron training, that optimizes the system to predict both Mel spectrum and phrase breaks. To our best knowledge, this is the first implementation of multi-task learning for Tacotron based TTS with a prosodic phrasing model. Experiments show that our proposed training scheme consistently improves the voice quality for both Chinese and Mongolian systems.

中文翻译：

在基于 Tacotron 的 TTS 中使用多任务学习建模韵律短语

基于 Tacotron 的端到端语音合成已显示出卓越的语音质量。然而，合成语音中的韵律渲染仍有待改进，特别是对于长句，韵律措辞错误经常发生。在这封信中，我们扩展了基于 Tacotron 的语音合成框架，以明确地对韵律断句进行建模。我们为 Tacotron 训练提出了一种多任务学习方案，该方案优化了系统以预测 Mel 谱和断句。据我们所知，这是第一个使用韵律短语模型对基于 Tacotron 的 TTS 进行多任务学习的实现。实验表明，我们提出的训练方案持续提高了中文和蒙古语系统的语音质量。

更新日期：2020-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>