当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DiffWave: A Versatile Diffusion Model for Audio Synthesis
arXiv - CS - Sound Pub Date : 2020-09-21 , DOI: arxiv-2009.09761 Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro
arXiv - CS - Sound Pub Date : 2020-09-21 , DOI: arxiv-2009.09761 Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro
In this work, we propose DiffWave, a versatile Diffusion probabilistic model
for conditional and unconditional Waveform generation. The model is
non-autoregressive, and converts the white noise signal into structured
waveform through a Markov chain with a constant number of steps at synthesis.
It is efficiently trained by optimizing a variant of variational bound on the
data likelihood. DiffWave produces high-fidelity audios in Different Waveform
generation tasks, including neural vocoding conditioned on mel spectrogram,
class-conditional generation, and unconditional generation. We demonstrate that
DiffWave matches a strong WaveNet vocoder in terms of speech quality~(MOS: 4.44
versus 4.43), while synthesizing orders of magnitude faster. In particular, it
significantly outperforms autoregressive and GAN-based waveform models in the
challenging unconditional generation task in terms of audio quality and sample
diversity from various automatic and human evaluations.
中文翻译:
DiffWave:用于音频合成的多功能扩散模型
在这项工作中,我们提出了 DiffWave,这是一种用于条件和无条件波形生成的通用扩散概率模型。该模型是非自回归的,并通过马尔可夫链将白噪声信号转换为结构化波形,合成时具有恒定的步数。它通过优化数据似然的变分界限的变体来有效地训练。DiffWave 在不同波形生成任务中生成高保真音频,包括以梅尔谱图为条件的神经声码、类条件生成和无条件生成。我们证明 DiffWave 在语音质量方面与强大的 WaveNet 声码器相匹配~(MOS:4.44 对 4.43),同时合成速度快几个数量级。特别是,
更新日期:2020-09-22
中文翻译:
DiffWave:用于音频合成的多功能扩散模型
在这项工作中,我们提出了 DiffWave,这是一种用于条件和无条件波形生成的通用扩散概率模型。该模型是非自回归的,并通过马尔可夫链将白噪声信号转换为结构化波形,合成时具有恒定的步数。它通过优化数据似然的变分界限的变体来有效地训练。DiffWave 在不同波形生成任务中生成高保真音频,包括以梅尔谱图为条件的神经声码、类条件生成和无条件生成。我们证明 DiffWave 在语音质量方面与强大的 WaveNet 声码器相匹配~(MOS:4.44 对 4.43),同时合成速度快几个数量级。特别是,