Eigenresiduals for improved Parametric Speech Synthesis,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Eigenresiduals for improved Parametric Speech Synthesis
arXiv - CS - Sound Pub Date : 2020-01-02 , DOI: arxiv-2001.00581
Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit

Statistical parametric speech synthesizers have recently shown their ability to produce natural-sounding and flexible voices. Unfortunately the delivered quality suffers from a typical buzziness due to the fact that speech is vocoded. This paper proposes a new excitation model in order to reduce this undesirable effect. This model is based on the decomposition of pitch-synchronous residual frames on an orthonormal basis obtained by Principal Component Analysis. This basis contains a limited number of eigenresiduals and is computed on a relatively small speech database. A stream of PCA-based coefficients is added to our HMM-based synthesizer and allows to generate the voiced excitation during the synthesis. An improvement compared to the traditional excitation is reported while the synthesis engine footprint remains under about 1Mb.

中文翻译：

用于改进参数语音合成的特征残差

统计参数语音合成器最近展示了它们产生自然和灵活声音的能力。不幸的是，由于语音是语音编码的，因此传输质量会受到典型的嗡嗡声的影响。本文提出了一种新的激励模型，以减少这种不良影响。该模型基于在通过主成分分析获得的正交基础上分解音调同步残差帧。该基包含有限数量的特征残差，并且是在相对较小的语音数据库上计算的。基于 PCA 的系数流被添加到我们基于 HMM 的合成器中，并允许在合成过程中生成浊音激励。据报道，与传统激励相比有所改进，而合成引擎足迹保持在约 1Mb 以下。

更新日期：2020-01-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>