当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Mel-spectrogram augmentation for sequence to sequence voice conversion
arXiv - CS - Sound Pub Date : 2020-01-06 , DOI: arxiv-2001.01401 Yeongtae Hwang, Hyemin Cho, Hongsun Yang, Dong-Ok Won, Insoo Oh, and Seong-Whan Lee
arXiv - CS - Sound Pub Date : 2020-01-06 , DOI: arxiv-2001.01401 Yeongtae Hwang, Hyemin Cho, Hongsun Yang, Dong-Ok Won, Insoo Oh, and Seong-Whan Lee
For training the sequence-to-sequence voice conversion model, we need to
handle an issue of insufficient data about the number of speech pairs which
consist of the same utterance. This study experimentally investigated the
effects of Mel-spectrogram augmentation on training the sequence-to-sequence
voice conversion (VC) model from scratch. For Mel-spectrogram augmentation, we
adopted the policies proposed in SpecAugment. In addition, we proposed new
policies (i.e., frequency warping, loudness and time length control) for more
data variations. Moreover, to find the appropriate hyperparameters of
augmentation policies without training the VC model, we proposed hyperparameter
search strategy and the new metric for reducing experimental cost, namely
deformation per deteriorating ratio. We compared the effect of these
Mel-spectrogram augmentation methods based on various sizes of training set and
augmentation policies. In the experimental results, the time axis warping based
policies (i.e., time length control and time warping.) showed better
performance than other policies. These results indicate that the use of the
Mel-spectrogram augmentation is more beneficial for training the VC model.
中文翻译:
用于序列到序列语音转换的梅尔谱图增强
为了训练序列到序列的语音转换模型,我们需要处理由相同话语组成的语音对数量数据不足的问题。本研究通过实验研究了梅尔谱图增强对从头开始训练序列到序列语音转换 (VC) 模型的影响。对于梅尔谱图增强,我们采用了 SpecAugment 中提出的策略。此外,我们针对更多的数据变化提出了新的策略(即频率扭曲、响度和时间长度控制)。此外,为了在不训练 VC 模型的情况下找到合适的增强策略的超参数,我们提出了超参数搜索策略和降低实验成本的新指标,即每恶化比率的变形。我们比较了这些基于各种规模的训练集和增强策略的梅尔谱增强方法的效果。在实验结果中,基于时间轴扭曲的策略(即时间长度控制和时间扭曲)表现出比其他策略更好的性能。这些结果表明,使用梅尔谱图增强对训练 VC 模型更有利。
更新日期:2020-06-16
中文翻译:
用于序列到序列语音转换的梅尔谱图增强
为了训练序列到序列的语音转换模型,我们需要处理由相同话语组成的语音对数量数据不足的问题。本研究通过实验研究了梅尔谱图增强对从头开始训练序列到序列语音转换 (VC) 模型的影响。对于梅尔谱图增强,我们采用了 SpecAugment 中提出的策略。此外,我们针对更多的数据变化提出了新的策略(即频率扭曲、响度和时间长度控制)。此外,为了在不训练 VC 模型的情况下找到合适的增强策略的超参数,我们提出了超参数搜索策略和降低实验成本的新指标,即每恶化比率的变形。我们比较了这些基于各种规模的训练集和增强策略的梅尔谱增强方法的效果。在实验结果中,基于时间轴扭曲的策略(即时间长度控制和时间扭曲)表现出比其他策略更好的性能。这些结果表明,使用梅尔谱图增强对训练 VC 模型更有利。