当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression
arXiv - CS - Sound Pub Date : 2020-03-26 , DOI: arxiv-2003.11750
Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Hayashi, and Tomoki Toda

In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic features, acoustic and temporal mismatches, and exposure bias usually lead to significant speech quality degradation, making WN generate some very noisy speech segments called collapsed speech. To tackle the problem, we take conventional-vocoder-generated speech as the reference speech to derive a linear predictive coding distribution constraint (LPCDC) to avoid the collapsed speech problem. Furthermore, to mitigate the negative effects introduced by the LPCDC, we propose a collapsed speech segment detector (CSSD) to ensure that the LPCDC is only applied to the problematic segments to limit the loss of quality to short periods. Objective and subjective evaluations are conducted, and the experimental results confirm the effectiveness of the proposed method, which further improves the speech quality of our previous non-parallel VC system submitted to Voice Conversion Challenge 2018.

中文翻译:

具有 WaveNet 声码器和折叠语音抑制的非并行语音转换系统

在本文中,我们将简单的非并行语音转换 (VC) 系统与 WaveNet (WN) 声码器和建议的折叠语音抑制技术相结合。WN 作为基于声学特征生成高保真语音波形的声码器的有效性已在最近的工作中得到证实。然而,当 WN 声码器与 VC 系统结合时,失真的声学特征、声学和时间不匹配以及暴露偏差通常会导致语音质量显着下降,从而使 WN 生成一些非常嘈杂的语音片段,称为折叠语音。为了解决这个问题,我们将传统声码器生成的语音作为参考语音来推导线性预测编码分布约束(LPCDC),以避免语音塌陷问题。此外,为了减轻 LPCDC 引入的负面影响,我们提出了一种折叠语音段检测器 (CSSD),以确保 LPCDC 仅应用于有问题的段,以将质量损失限制在短期内。进行了客观和主观评估,实验结果证实了所提出方法的有效性,这进一步提高了我们之前提交给 Voice Conversion Challenge 2018 的非并行 VC 系统的语音质量。
更新日期:2020-04-08
down
wechat
bug