SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis
arXiv - CS - Sound Pub Date : 2020-01-16 , DOI: arxiv-2001.05685
Bohan Zhai, Tianren Gao, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph E. Gonzalez, Kurt Keutzer

Automatic speech synthesis is a challenging task that is becoming increasingly important as edge devices begin to interact with users through speech. Typical text-to-speech pipelines include a vocoder, which translates intermediate audio representations into an audio waveform. Most existing vocoders are difficult to parallelize since each generated sample is conditioned on previous samples. WaveGlow is a flow-based feed-forward alternative to these auto-regressive models (Prenger et al., 2019). However, while WaveGlow can be easily parallelized, the model is too expensive for real-time speech synthesis on the edge. This paper presents SqueezeWave, a family of lightweight vocoders based on WaveGlow that can generate audio of similar quality to WaveGlow with 61x - 214x fewer MACs. Code, trained models, and generated audio are publicly available at https://github.com/tianrengao/SqueezeWave.

中文翻译：

SqueezeWave：用于设备上语音合成的超轻量级声码器

自动语音合成是一项具有挑战性的任务，随着边缘设备开始通过语音与用户交互，它变得越来越重要。典型的文本到语音管道包括声码器，它将中间音频表示转换为音频波形。大多数现有的声码器难以并行化，因为每个生成的样本都以先前的样本为条件。WaveGlow 是这些自回归模型的基于流的前馈替代方案（Prenger 等，2019）。然而，虽然 WaveGlow 可以很容易地并行化，但该模型对于边缘的实时语音合成来说过于昂贵。本文介绍了 SqueezeWave，这是一个基于 WaveGlow 的轻量级声码器系列，可以生成与 WaveGlow 质量相似的音频，MAC 减少 61 到 214 倍。代码，训练有素的模型，

更新日期：2020-01-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>