当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network
arXiv - CS - Sound Pub Date : 2020-07-11 , DOI: arxiv-2007.05663 Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, and Tomoki Toda
arXiv - CS - Sound Pub Date : 2020-07-11 , DOI: arxiv-2007.05663 Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, and Tomoki Toda
In this paper, a pitch-adaptive waveform generative model named
Quasi-Periodic WaveNet (QPNet) is proposed to improve the pitch controllability
of vanilla WaveNet (WN) using pitch-dependent dilated convolution neural
networks (PDCNNs). Specifically, as a probabilistic autoregressive generation
model with stacked dilated convolution layers, WN achieves high-fidelity audio
waveform generation. However, the pure-data-driven nature and the lack of prior
knowledge of audio signals degrade the pitch controllability of WN. For
instance, it is difficult for WN to precisely generate the periodic components
of audio signals when the given auxiliary fundamental frequency (F0) features
are outside the F0 range observed in the training data. To address this
problem, QPNet with two novel designs is proposed. First, the PDCNN component
is applied to dynamically change the network architecture of WN according to
the given auxiliary F0 features. Second, a cascaded network structure is
utilized to simultaneously model the long- and short-term dependences of
quasi-periodic signals such as speech. The performances of single-tone sinusoid
and speech generations are evaluated. The experimental results show the
effectiveness of the PDCNNs for unseen auxiliary F0 features and the
effectiveness of the cascaded structure for speech generation.
中文翻译:
Quasi-Periodic WaveNet:一种具有与音高相关的扩张卷积神经网络的自回归原始波形生成模型
在本文中,提出了一种名为 Quasi-Periodic WaveNet (QPNet) 的音调自适应波形生成模型,以使用与音调相关的扩张卷积神经网络 (PDCNN) 来提高 vanilla WaveNet (WN) 的音调可控性。具体来说,作为具有堆叠扩张卷积层的概率自回归生成模型,WN 实现了高保真音频波形生成。然而,纯数据驱动的性质和音频信号先验知识的缺乏降低了 WN 的音调可控性。例如,当给定的辅助基频 (F0) 特征超出训练数据中观察到的 F0 范围时,WN 很难精确生成音频信号的周期分量。为了解决这个问题,提出了具有两种新颖设计的 QPNet。第一的,PDCNN 组件用于根据给定的辅助 F0 特征动态改变 WN 的网络架构。其次,级联网络结构用于同时模拟准周期信号(如语音)的长期和短期依赖性。评估单音正弦和语音生成的性能。实验结果表明 PDCNN 对看不见的辅助 F0 特征的有效性和级联结构对语音生成的有效性。
更新日期:2020-11-12
中文翻译:
Quasi-Periodic WaveNet:一种具有与音高相关的扩张卷积神经网络的自回归原始波形生成模型
在本文中,提出了一种名为 Quasi-Periodic WaveNet (QPNet) 的音调自适应波形生成模型,以使用与音调相关的扩张卷积神经网络 (PDCNN) 来提高 vanilla WaveNet (WN) 的音调可控性。具体来说,作为具有堆叠扩张卷积层的概率自回归生成模型,WN 实现了高保真音频波形生成。然而,纯数据驱动的性质和音频信号先验知识的缺乏降低了 WN 的音调可控性。例如,当给定的辅助基频 (F0) 特征超出训练数据中观察到的 F0 范围时,WN 很难精确生成音频信号的周期分量。为了解决这个问题,提出了具有两种新颖设计的 QPNet。第一的,PDCNN 组件用于根据给定的辅助 F0 特征动态改变 WN 的网络架构。其次,级联网络结构用于同时模拟准周期信号(如语音)的长期和短期依赖性。评估单音正弦和语音生成的性能。实验结果表明 PDCNN 对看不见的辅助 F0 特征的有效性和级联结构对语音生成的有效性。