Development and Evaluation of Speech Synthesis System Based on Deep Learning Models,Symmetry

当前位置： X-MOL 学术 › Symmetry › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Development and Evaluation of Speech Synthesis System Based on Deep Learning Models
Symmetry ( IF 2.2 ) Pub Date : 2021-05-07 , DOI: 10.3390/sym13050819
Alakbar Valizada , Sevil Jafarova , Emin Sultanov , Samir Rustamov

This study concentrates on the investigation, development, and evaluation of Text-to-Speech Synthesis systems based on Deep Learning models for the Azerbaijani Language. We have selected and compared state-of-the-art models-Tacotron and Deep Convolutional Text-to-Speech (DC TTS) systems to achieve the most optimal model. Both systems were trained on the 24 h speech dataset of the Azerbaijani language collected and processed from the news website. To analyze the quality and intelligibility of the speech signals produced by two systems, 34 listeners participated in an online survey containing subjective evaluation tests. The results of the study indicated that according to the Mean Opinion Score, Tacotron demonstrated better results for the In-Vocabulary words; however, DC TTS indicated a higher performance of the Out-Of-Vocabulary words synthesis.

中文翻译：

基于深度学习模型的语音合成系统的开发与评估

本研究专注于基于阿塞拜疆语言的深度学习模型的语音合成系统的研究，开发和评估。我们选择并比较了最先进的模型，即Tacotron和深度卷积文本语音转换（DC TTS）系统，以实现最佳模型。两种系统都在从新闻网站收集和处理的阿塞拜疆语言的24小时语音数据集中进行了培训。为了分析两个系统产生的语音信号的质量和清晰度，有34位听众参加了包含主观评估测试的在线调查。研究结果表明，根据平均意见得分，Tacotron对词汇量单词表现出更好的结果。但是，DC TTS表示了词外单词合成的更高性能。

更新日期：2021-05-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文