当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mel-spectrogram augmentation for sequence to sequence voice conversion
arXiv - CS - Sound Pub Date : 2020-01-06 , DOI: arxiv-2001.01401
Yeongtae Hwang, Hyemin Cho, Hongsun Yang, Dong-Ok Won, Insoo Oh, and Seong-Whan Lee

For training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech pairs which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on training the sequence-to-sequence voice conversion (VC) model from scratch. For Mel-spectrogram augmentation, we adopted the policies proposed in SpecAugment. In addition, we proposed new policies (i.e., frequency warping, loudness and time length control) for more data variations. Moreover, to find the appropriate hyperparameters of augmentation policies without training the VC model, we proposed hyperparameter search strategy and the new metric for reducing experimental cost, namely deformation per deteriorating ratio. We compared the effect of these Mel-spectrogram augmentation methods based on various sizes of training set and augmentation policies. In the experimental results, the time axis warping based policies (i.e., time length control and time warping.) showed better performance than other policies. These results indicate that the use of the Mel-spectrogram augmentation is more beneficial for training the VC model.

中文翻译:

用于序列到序列语音转换的梅尔谱图增强

为了训练序列到序列的语音转换模型,我们需要处理由相同话语组成的语音对数量数据不足的问题。本研究通过实验研究了梅尔谱图增强对从头开始训练序列到序列语音转换 (VC) 模型的影响。对于梅尔谱图增强,我们采用了 SpecAugment 中提出的策略。此外,我们针对更多的数据变化提出了新的策略(即频率扭曲、响度和时间长度控制)。此外,为了在不训练 VC 模型的情况下找到合适的增强策略的超参数,我们提出了超参数搜索策略和降低实验成本的新指标,即每恶化比率的变形。我们比较了这些基于各种规模的训练集和增强策略的梅尔谱增强方法的效果。在实验结果中,基于时间轴扭曲的策略(即时间长度控制和时间扭曲)表现出比其他策略更好的性能。这些结果表明,使用梅尔谱图增强对训练 VC 模型更有利。
更新日期:2020-06-16
down
wechat
bug