当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How do Voices from Past Speech Synthesis Challenges Compare Today?
arXiv - CS - Sound Pub Date : 2021-05-05 , DOI: arxiv-2105.02373
Erica Cooper, Junichi Yamagishi

Shared challenges provide a venue for comparing systems trained on common data using a standardized evaluation, and they also provide an invaluable resource for researchers when the data and evaluation results are publicly released. The Blizzard Challenge and Voice Conversion Challenge are two such challenges for text-to-speech synthesis and for speaker conversion, respectively, and their publicly-available system samples and listening test results comprise a historical record of state-of-the-art synthesis methods over the years. In this paper, we revisit these past challenges and conduct a large-scale listening test with samples from many challenges combined. Our aims are to analyze and compare opinions of a large number of systems together, to determine whether and how opinions change over time, and to collect a large-scale dataset of a diverse variety of synthetic samples and their ratings for further research. We found strong correlations challenge by challenge at the system level between the original results and our new listening test. We also observed the importance of the choice of speaker on synthesis quality.

中文翻译:

过去的语音合成挑战中的声音与今天相比如何?

共同的挑战为使用标准化评估对经过通用数据训练的系统进行比较提供了场所,当数据和评估结果公开发​​布时,它们也为研究人员提供了宝贵的资源。暴雪挑战和语音转换挑战分别是文本到语音合成和说话者转换的两个挑战,它们公开可用的系统样本和听力测试结果构成了最新合成方法的历史记录这些年来。在本文中,我们将回顾这些过去的挑战,并对来自许多挑战的样本进行大规模的听力测试。我们的目标是一起分析和比较大量系统的意见,以确定意见是否随时间变化以及如何随时间变化,并收集各种合成样品及其评级的大规模数据集,以供进一步研究。我们发现原始结果与我们的新听力测试之间在系统级别上都面临着严峻的挑战。我们还观察到选择发言人对合成质量的重要性。
更新日期:2021-05-07
down
wechat
bug