当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DiffWave: A Versatile Diffusion Model for Audio Synthesis
arXiv - CS - Sound Pub Date : 2020-09-21 , DOI: arxiv-2009.09761
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro

In this work, we propose DiffWave, a versatile Diffusion probabilistic model for conditional and unconditional Waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in Different Waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality~(MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.

中文翻译:

DiffWave:用于音频合成的多功能扩散模型

在这项工作中,我们提出了 DiffWave,这是一种用于条件和无条件波形生成的通用扩散概率模型。该模型是非自回归的,并通过马尔可夫链将白噪声信号转换为结构化波形,合成时具有恒定的步数。它通过优化数据似然的变分界限的变体来有效地训练。DiffWave 在不同波形生成任务中生成高保真音频,包括以梅尔谱图为条件的神经声码、类条件生成和无条件生成。我们证明 DiffWave 在语音质量方面与强大的 WaveNet 声码器相匹配~(MOS:4.44 对 4.43),同时合成速度快几个数量级。特别是,
更新日期:2020-09-22
down
wechat
bug