Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation
arXiv - CS - Sound Pub Date : 2019-07-01 , DOI: arxiv-1907.00797
Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda

In this paper, we propose a quasi-periodic neural network (QPNet) vocoder with a novel network architecture named pitch-dependent dilated convolution (PDCNN) to improve the pitch controllability of WaveNet (WN) vocoder. The effectiveness of the WN vocoder to generate high-fidelity speech samples from given acoustic features has been proved recently. However, because of the fixed dilated convolution and generic network architecture, the WN vocoder hardly generates speech with given F0 values which are outside the range observed in training data. Consequently, the WN vocoder lacks the pitch controllability which is one of the essential capabilities of conventional vocoders. To address this limitation, we propose the PDCNN component which has the time-variant adaptive dilation size related to the given F0 values and a cascade network structure of the QPNet vocoder to generate quasi-periodic signals such as speech. Both objective and subjective tests are conducted, and the experimental results demonstrate the better pitch controllability of the QPNet vocoder compared to the same and double sized WN vocoders while attaining comparable speech qualities. Index Terms: WaveNet, vocoder, quasi-periodic signal, pitch-dependent dilated convolution, pitch controllability

中文翻译：

准周期 WaveNet 声码器：用于参数语音生成的音高相关扩张卷积模型

在本文中，我们提出了一种准周期神经网络 (QPNet) 声码器，该声码器具有称为音高相关扩张卷积 (PDCNN) 的新型网络架构，以提高 WaveNet (WN) 声码器的音高可控性。最近证明了 WN 声码器从给定的声学特征生成高保真语音样本的有效性。然而，由于固定的扩张卷积和通用网络架构，WN 声码器很难生成具有在训练数据中观察到的范围之外的给定 F0 值的语音。因此，WN 声码器缺乏音调可控性，这是传统声码器的基本能力之一。为了解决这个限制，我们提出了 PDCNN 组件，它具有与给定 F0 值相关的时变自适应膨胀大小和 QPNet 声码器的级联网络结构，以生成准周期信号，例如语音。进行了客观和主观测试，实验结果表明，与相同和双倍大小的 WN 声码器相比，QPNet 声码器具有更好的音调可控性，同时获得可比的语音质量。索引词：WaveNet、声码器、准周期信号、音调相关扩张卷积、音调可控性实验结果表明，与相同和双倍大小的 WN 声码器相比，QPNet 声码器具有更好的音调可控性，同时获得可比的语音质量。索引词：WaveNet、声码器、准周期信号、音调相关扩张卷积、音调可控性实验结果表明，与相同和双倍大小的 WN 声码器相比，QPNet 声码器具有更好的音调可控性，同时获得可比的语音质量。索引词：WaveNet、声码器、准周期信号、音调相关扩张卷积、音调可控性

更新日期：2020-03-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文