Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network
arXiv - CS - Sound Pub Date : 2020-07-11 , DOI: arxiv-2007.05663
Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, and Tomoki Toda

In this paper, a pitch-adaptive waveform generative model named Quasi-Periodic WaveNet (QPNet) is proposed to improve the pitch controllability of vanilla WaveNet (WN) using pitch-dependent dilated convolution neural networks (PDCNNs). Specifically, as a probabilistic autoregressive generation model with stacked dilated convolution layers, WN achieves high-fidelity audio waveform generation. However, the pure-data-driven nature and the lack of prior knowledge of audio signals degrade the pitch controllability of WN. For instance, it is difficult for WN to precisely generate the periodic components of audio signals when the given auxiliary fundamental frequency (F0) features are outside the F0 range observed in the training data. To address this problem, QPNet with two novel designs is proposed. First, the PDCNN component is applied to dynamically change the network architecture of WN according to the given auxiliary F0 features. Second, a cascaded network structure is utilized to simultaneously model the long- and short-term dependences of quasi-periodic signals such as speech. The performances of single-tone sinusoid and speech generations are evaluated. The experimental results show the effectiveness of the PDCNNs for unseen auxiliary F0 features and the effectiveness of the cascaded structure for speech generation.

中文翻译：

Quasi-Periodic WaveNet：一种具有与音高相关的扩张卷积神经网络的自回归原始波形生成模型

在本文中，提出了一种名为 Quasi-Periodic WaveNet (QPNet) 的音调自适应波形生成模型，以使用与音调相关的扩张卷积神经网络 (PDCNN) 来提高 vanilla WaveNet (WN) 的音调可控性。具体来说，作为具有堆叠扩张卷积层的概率自回归生成模型，WN 实现了高保真音频波形生成。然而，纯数据驱动的性质和音频信号先验知识的缺乏降低了 WN 的音调可控性。例如，当给定的辅助基频 (F0) 特征超出训练数据中观察到的 F0 范围时，WN 很难精确生成音频信号的周期分量。为了解决这个问题，提出了具有两种新颖设计的 QPNet。第一的，PDCNN 组件用于根据给定的辅助 F0 特征动态改变 WN 的网络架构。其次，级联网络结构用于同时模拟准周期信号（如语音）的长期和短期依赖性。评估单音正弦和语音生成的性能。实验结果表明 PDCNN 对看不见的辅助 F0 特征的有效性和级联结构对语音生成的有效性。

更新日期：2020-11-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>