当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SqueezeWave: Extremely Lightweight Vocoders for On-device Speech Synthesis
arXiv - CS - Sound Pub Date : 2020-01-16 , DOI: arxiv-2001.05685 Bohan Zhai, Tianren Gao, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph E. Gonzalez, Kurt Keutzer
arXiv - CS - Sound Pub Date : 2020-01-16 , DOI: arxiv-2001.05685 Bohan Zhai, Tianren Gao, Flora Xue, Daniel Rothchild, Bichen Wu, Joseph E. Gonzalez, Kurt Keutzer
Automatic speech synthesis is a challenging task that is becoming
increasingly important as edge devices begin to interact with users through
speech. Typical text-to-speech pipelines include a vocoder, which translates
intermediate audio representations into an audio waveform. Most existing
vocoders are difficult to parallelize since each generated sample is
conditioned on previous samples. WaveGlow is a flow-based feed-forward
alternative to these auto-regressive models (Prenger et al., 2019). However,
while WaveGlow can be easily parallelized, the model is too expensive for
real-time speech synthesis on the edge. This paper presents SqueezeWave, a
family of lightweight vocoders based on WaveGlow that can generate audio of
similar quality to WaveGlow with 61x - 214x fewer MACs. Code, trained models,
and generated audio are publicly available at
https://github.com/tianrengao/SqueezeWave.
中文翻译:
SqueezeWave:用于设备上语音合成的超轻量级声码器
自动语音合成是一项具有挑战性的任务,随着边缘设备开始通过语音与用户交互,它变得越来越重要。典型的文本到语音管道包括声码器,它将中间音频表示转换为音频波形。大多数现有的声码器难以并行化,因为每个生成的样本都以先前的样本为条件。WaveGlow 是这些自回归模型的基于流的前馈替代方案(Prenger 等,2019)。然而,虽然 WaveGlow 可以很容易地并行化,但该模型对于边缘的实时语音合成来说过于昂贵。本文介绍了 SqueezeWave,这是一个基于 WaveGlow 的轻量级声码器系列,可以生成与 WaveGlow 质量相似的音频,MAC 减少 61 到 214 倍。代码,训练有素的模型,
更新日期:2020-01-17
中文翻译:
SqueezeWave:用于设备上语音合成的超轻量级声码器
自动语音合成是一项具有挑战性的任务,随着边缘设备开始通过语音与用户交互,它变得越来越重要。典型的文本到语音管道包括声码器,它将中间音频表示转换为音频波形。大多数现有的声码器难以并行化,因为每个生成的样本都以先前的样本为条件。WaveGlow 是这些自回归模型的基于流的前馈替代方案(Prenger 等,2019)。然而,虽然 WaveGlow 可以很容易地并行化,但该模型对于边缘的实时语音合成来说过于昂贵。本文介绍了 SqueezeWave,这是一个基于 WaveGlow 的轻量级声码器系列,可以生成与 WaveGlow 质量相似的音频,MAC 减少 61 到 214 倍。代码,训练有素的模型,