当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis
arXiv - CS - Sound Pub Date : 2019-12-29 , DOI: arxiv-2001.00842
Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit

Speech generated by parametric synthesizers generally suffers from a typical buzziness, similar to what was encountered in old LPC-like vocoders. In order to alleviate this problem, a more suited modeling of the excitation should be adopted. For this, we hereby propose an adaptation of the Deterministic plus Stochastic Model (DSM) for the residual. In this model, the excitation is divided into two distinct spectral bands delimited by the maximum voiced frequency. The deterministic part concerns the low-frequency contents and consists of a decomposition of pitch-synchronous residual frames on an orthonormal basis obtained by Principal Component Analysis. The stochastic component is a high-pass filtered noise whose time structure is modulated by an energy-envelope, similarly to what is done in the Harmonic plus Noise Model (HNM). The proposed residual model is integrated within a HMM-based speech synthesizer and is compared to the traditional excitation through a subjective test. Results show a significative improvement for both male and female voices. In addition the proposed model requires few computational load and memory, which is essential for its integration in commercial applications.

中文翻译:

用于改进参数语音合成的残差信号的确定性加随机模型

参数合成器生成的语音通常会出现典型的嗡嗡声,类似于旧的类似 LPC 的声码器。为了缓解这个问题,应该采用更合适的激励建模。为此,我们在此建议对残差采用确定性加随机模型 (DSM)。在这个模型中,激励被分成两个不同的频谱带,由最大浊音频率定界。确定性部分涉及低频内容,包括在通过主成分分析获得的正交基础上对基音同步残差帧的分解。随机分量是高通滤波噪声,其时间结构由能量包络调制,类似于谐波加噪声模型 (HNM) 中所做的。建议的残差模型集成在基于 HMM 的语音合成器中,并通过主观测试与传统激励进行比较。结果显示男性和女性的声音都有显着改善。此外,所提出的模型需要很少的计算负载和内存,这对于其在商业应用中的集成至关重要。
更新日期:2020-01-06
down
wechat
bug