当前位置:
X-MOL 学术
›
arXiv.cs.MM
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast Griffin Lim based Waveform Generation Strategy for Text-to-Speech Synthesis
arXiv - CS - Multimedia Pub Date : 2020-07-11 , DOI: arxiv-2007.05764 Ankit Sharma, Puneet Kumar, Vikas Maddukuri, Nagasai Madamshettib, Kishore KG, Sahit Sai Sriram Kavurub, Balasubramanian Raman and Partha Pratim Roy
arXiv - CS - Multimedia Pub Date : 2020-07-11 , DOI: arxiv-2007.05764 Ankit Sharma, Puneet Kumar, Vikas Maddukuri, Nagasai Madamshettib, Kishore KG, Sahit Sai Sriram Kavurub, Balasubramanian Raman and Partha Pratim Roy
The performance of text-to-speech (TTS) systems heavily depends on
spectrogram to waveform generation, also known as the speech reconstruction
phase. The time required for the same is known as synthesis delay. In this
paper, an approach to reduce speech synthesis delay has been proposed. It aims
to enhance the TTS systems for real-time applications such as digital
assistants, mobile phones, embedded devices, etc. The proposed approach applies
Fast Griffin Lim Algorithm (FGLA) instead Griffin Lim algorithm (GLA) as
vocoder in the speech synthesis phase. GLA and FGLA are both iterative, but the
convergence rate of FGLA is faster than GLA. The proposed approach is tested on
LJSpeech, Blizzard and Tatoeba datasets and the results for FGLA are compared
against GLA and neural Generative Adversarial Network (GAN) based vocoder. The
performance is evaluated based on synthesis delay and speech quality. A 36.58%
reduction in speech synthesis delay has been observed. The quality of the
output speech has improved, which is advocated by higher Mean opinion scores
(MOS) and faster convergence with FGLA as opposed to GLA.
中文翻译:
用于文本到语音合成的基于快速 Griffin Lim 的波形生成策略
文本到语音 (TTS) 系统的性能在很大程度上取决于频谱图到波形的生成,也称为语音重建阶段。所需的时间称为合成延迟。本文提出了一种降低语音合成延迟的方法。它旨在增强实时应用程序的 TTS 系统,如数字助理、移动电话、嵌入式设备等。 所提出的方法在语音合成阶段应用 Fast Griffin Lim 算法(FGLA)代替 Griffin Lim 算法(GLA)作为声码器. GLA 和 FGLA 都是迭代的,但 FGLA 的收敛速度比 GLA 快。所提出的方法在 LJSpeech、Blizzard 和 Tatoeba 数据集上进行了测试,并将 FGLA 的结果与 GLA 和基于神经生成对抗网络 (GAN) 的声码器进行了比较。基于合成延迟和语音质量评估性能。观察到语音合成延迟减少了 36.58%。输出语音的质量得到了提高,这是由更高的平均意见分数 (MOS) 和与 FGLA 相对于 GLA 的更快收敛所提倡的。
更新日期:2020-07-14
中文翻译:
用于文本到语音合成的基于快速 Griffin Lim 的波形生成策略
文本到语音 (TTS) 系统的性能在很大程度上取决于频谱图到波形的生成,也称为语音重建阶段。所需的时间称为合成延迟。本文提出了一种降低语音合成延迟的方法。它旨在增强实时应用程序的 TTS 系统,如数字助理、移动电话、嵌入式设备等。 所提出的方法在语音合成阶段应用 Fast Griffin Lim 算法(FGLA)代替 Griffin Lim 算法(GLA)作为声码器. GLA 和 FGLA 都是迭代的,但 FGLA 的收敛速度比 GLA 快。所提出的方法在 LJSpeech、Blizzard 和 Tatoeba 数据集上进行了测试,并将 FGLA 的结果与 GLA 和基于神经生成对抗网络 (GAN) 的声码器进行了比较。基于合成延迟和语音质量评估性能。观察到语音合成延迟减少了 36.58%。输出语音的质量得到了提高,这是由更高的平均意见分数 (MOS) 和与 FGLA 相对于 GLA 的更快收敛所提倡的。