WaveFlow: A Compact Flow-based Model for Raw Audio,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

WaveFlow: A Compact Flow-based Model for Raw Audio
arXiv - CS - Sound Pub Date : 2019-12-03 , DOI: arxiv-1912.01219
Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song

In this work, we propose WaveFlow, a small-footprint generative flow for raw audio, which is directly trained with maximum likelihood. It handles the long-range structure of 1-D waveform with a dilated 2-D convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases. It generates high-fidelity speech as WaveNet, while synthesizing several orders of magnitude faster as it only requires a few sequential steps to generate very long waveforms with hundreds of thousands of time-steps. Furthermore, it can significantly reduce the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has only 5.91M parameters, which is 15$\times$ smaller than WaveGlow. It can generate 22.05 kHz high-fidelity audio 42.6$\times$ faster than real-time (at a rate of 939.3 kHz) on a V100 GPU without engineered inference kernels.

中文翻译：

WaveFlow：用于原始音频的紧凑型基于流的模型

在这项工作中，我们提出了 WaveFlow，这是一种原始音频的小足迹生成流，它以最大似然直接训练。它使用扩张的二维卷积架构处理一维波形的长程结构，同时使用富有表现力的自回归函数对局部变化进行建模。WaveFlow 为一维数据提供了基于似然的模型的统一视图，包括 WaveNet 和 WaveGlow 作为特殊情况。它像 WaveNet 一样生成高保真语音，同时合成速度快几个数量级，因为它只需要几个连续步骤即可生成具有数十万个时间步长的超长波形。此外，它可以显着减少自回归模型和基于流的模型之间存在的可能性差距，以实现高效综合。最后，我们的小规模 WaveFlow 只有 5 个。91M 参数，比 WaveGlow 小 15$\times$。它可以在没有工程推理内核的 V100 GPU 上生成 22.05 kHz 高保真音频，比实时（以 939.3 kHz 的速率）快 42.6 倍。

更新日期：2020-06-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>