A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2,International Journal of Machine Learning and Cybernetics

当前位置： X-MOL 学术 › Int. J. Mach. Learn. & Cyber. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2
International Journal of Machine Learning and Cybernetics ( IF 5.6 ) Pub Date : 2021-07-05 , DOI: 10.1007/s13042-021-01365-x
Junmin Liu ₁ , Zhuangzhuang Xie _{1,

2} , Chunxia Zhang ₁ , Guang Shi ₁

Affiliation

Speech synthesis, an artificial intelligence technology that employs computers to imitate human speech, has played a crucial role in human–computer interaction since it can automatically convert text into speech with satisfactory intelligibility and naturalness. Tacotron2 is the second generation end-to-end English speech synthesis model developed by Google. As Mandarin becomes more and more popular in the world, the associated speech synthesis technologies have been applied in various applications. Aiming at extending Tacotron2 to synthesize Mandarin speech, we propose in this paper a novel synthesis method by adding a Mandarin-to-PinYin module and a prosodic structure prediction model into Tacotron2. By evaluating synthesized results with subjective and objective methods, the added prosodic structure prediction model is demonstrated to help Tacotron2 synthesize more natural and human-like Mandarin speech.

中文翻译：

一种在Tacotron2中插入韵律结构预测的普通话语音合成新方法

语音合成是一种利用计算机模拟人类语音的人工智能技术，能够自动将文本转换为具有令人满意的可懂度和自然度的语音，在人机交互中发挥了至关重要的作用。Tacotron2 是谷歌开发的第二代端到端英语语音合成模型。随着普通话在世界范围内越来越流行，相关的语音合成技术已被应用于各种应用中。为了扩展 Tacotron2 合成普通话语音，我们在本文中提出了一种新的合成方法，通过在 Tacotron2 中添加普通话到拼音模块和韵律结构预测模型。通过主观和客观的方法评估综合结果，

更新日期：2021-07-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>