当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Emilia: a speech corpus for Argentine Spanish text to speech synthesis
Language Resources and Evaluation ( IF 1.7 ) Pub Date : 2019-02-02 , DOI: 10.1007/s10579-019-09447-7
Humberto M. Torres , Jorge A. Gurlekian , Diego A. Evin , Christian G. Cossio Mercado

This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The key requirements and design criteria for Emilia were: to synthesize any text in Spanish into high-quality speech with a minimum corpus size. The text corpus was designed to guarantee the phonetic and prosodic coverage. A three-stage strategy was used: in the first stage, 741 sentences were designed with all of the syllables of Spanish spoken in Argentina, with and without stress, and in all positions within the word; in the second stage, 852 sentences were added to balance out the distribution of the diphones; and after a perceptual evaluation of the quality of synthesized speech, in the third and final stage, 625 sentences were added to achieve the specified unit coverage, and to introduce sentences with more complex syntactic and prosodic structures. Issues from all three corpus building stages are reported. The paper also presents the results from the quality perceptual evaluations of the synthesized voice. Emilia has a duration of three hours and 15 minutes; its speech quality synthesized with Aromo system is similar to the level obtained with commercial systems, with a real-time ratio less than one.

中文翻译:

艾米利亚:阿根廷语文本到语音合成的语音语料库

本文介绍了艾米莉亚(Emilia),这是一种语音语料库,其创建目的是在布宜诺斯艾利斯用Aromo文字转语音系统建立西班牙语的女性语音。Aromo是一个单位选择文本语音转换系统,它使用双音素作为合成单位。Emilia的主要要求和设计标准是:将西班牙语中的任何文本合成为具有最小语料库大小的高质量语音。语料库旨在确保语音和韵律覆盖。采用了三个阶段的策略:在第一阶段,设计了741个句子,其中在阿根廷使用了西班牙语的所有音节,无论有没有压力,单词的所有位置都被设计;在第二阶段,增加了852个句子以平衡双音的分布;在对合成语音的质量进行感知评估之后,在第三阶段(也是最后阶段),添加了625个句子以达到指定的单位覆盖率,并引入具有更复杂的句法和韵律结构的句子。报告了所有三个语料库构建阶段的问题。本文还介绍了来自合成语音质量感知评估的结果。艾米利亚(Emilia)的时长为3小时15分钟;其与Aromo系统合成的语音质量与商业系统获得的语音质量相似,实时比率小于1。艾米利亚(Emilia)的时长为3小时15分钟;其与Aromo系统合成的语音质量与商业系统获得的语音质量相似,实时比率小于1。艾米利亚(Emilia)的时长为3小时15分钟;其与Aromo系统合成的语音质量与商业系统获得的语音质量相似,实时比率小于1。
更新日期:2019-02-02
down
wechat
bug