WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss
arXiv - CS - Sound Pub Date : 2020-02-02 , DOI: arxiv-2002.00417
Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

Tacotron-based text-to-speech (TTS) systems directly synthesize speech from text input. Such frameworks typically consist of a feature prediction network that maps character sequences to frequency-domain acoustic features, followed by a waveform reconstruction algorithm or a neural vocoder that generates the time-domain waveform from acoustic features. As the loss function is usually calculated only for frequency-domain acoustic features, that doesn't directly control the quality of the generated time-domain waveform. To address this problem, we propose a new training scheme for Tacotron-based TTS, referred to as WaveTTS, that has 2 loss functions: 1) time-domain loss, denoted as the waveform loss, that measures the distortion between the natural and generated waveform; and 2) frequency-domain loss, that measures the Mel-scale acoustic feature loss between the natural and generated acoustic features. WaveTTS ensures both the quality of the acoustic features and the resulting speech waveform. To our best knowledge, this is the first implementation of Tacotron with joint time-frequency domain loss. Experimental results show that the proposed framework outperforms the baselines and achieves high-quality synthesized speech.

中文翻译：

WaveTTS：具有联合时频域损失的基于 Tacotron 的 TTS

基于 Tacotron 的文本到语音 (TTS) 系统直接从文本输入合成语音。此类框架通常包括将字符序列映射到频域声学特征的特征预测网络，然后是波形重建算法或从声学特征生成时域波形的神经声码器。由于损失函数通常仅针对频域声学特征计算，因此不能直接控制生成的时域波形的质量。为了解决这个问题，我们为基于 Tacotron 的 TTS 提出了一种新的训练方案，称为 WaveTTS，它具有 2 个损失函数：1) 时域损失，表示为波形损失，测量自然和生成之间的失真波形；和 2) 频域损失，测量自然声学特征和生成声学特征之间的梅尔尺度声学特征损失。WaveTTS 可确保声学特征的质量和生成的语音波形。据我们所知，这是具有联合时频域损失的 Tacotron 的第一个实现。实验结果表明，所提出的框架优于基线并实现了高质量的合成语音。

更新日期：2020-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文