当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Factorized WaveNet for voice conversion with limited data
Speech Communication ( IF 3.2 ) Pub Date : 2021-04-15 , DOI: 10.1016/j.specom.2021.03.003
Hongqiang Du , Xiaohai Tian , Lei Xie , Haizhou Li

WaveNet is introduced for waveform generation. It produces high quality text-to-speech synthesis, music generation, and voice conversion. However, it generally requires a large amount of training data, that limits its scope of applications, e.g. in voice conversion. In this paper, we propose a factorized WaveNet for limited data tasks. Specifically, we apply singular value decomposition (SVD) on the dilated convolution layers of WaveNet to reduce the number of parameters. By doing so, we reduce the data requirement for WaveNet training, while maintaining similar network performance. We use voice conversion as a case study to validate the proposed idea. Two sets of experiments are conducted, where WaveNet is used as a vocoder and an integrated converter–vocoder respectively. Experiments on CMU-ARCTIC and CSTR-VCTK corpora show that factorized WaveNet consistently outperforms its original WaveNet counterpart when using the same amount of training data. We also apply SVD similarly to real-time neural vocoder Parallel WaveGAN for voice conversion, and observe similar improvement.



中文翻译:

因数分解WaveNet用于有限数据的语音转换

引入WaveNet来生成波形。它可以产生高质量的文本到语音合成,音乐生成和语音转换。但是,它通常需要大量的训练数据,这限制了它的应用范围,例如在语音转换中。在本文中,我们提出了用于有限数据任务的分解式WaveNet。具体来说,我们在WaveNet的膨胀卷积层上应用奇异值分解(SVD)以减少参数的数量。这样,我们可以减少WaveNet培训的数据需求,同时保持类似的网络性能。我们使用语音转换作为案例研究来验证所提出的想法。进行了两组实验,其中WaveNet分别用作声码器和集成的转换器-声码器。在CMU-ARCTIC和CSTR-VCTK语料库上进行的实验表明,在使用相同数量的训练数据时,分解的WaveNet始终优于其原始WaveNet。我们还将SVD类似地应用于实时神经声码器Parallel WaveGAN进行语音转换,并观察到类似的改进。

更新日期:2021-04-21
down
wechat
bug