当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Eigenresiduals for improved Parametric Speech Synthesis
arXiv - CS - Sound Pub Date : 2020-01-02 , DOI: arxiv-2001.00581 Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit
arXiv - CS - Sound Pub Date : 2020-01-02 , DOI: arxiv-2001.00581 Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit
Statistical parametric speech synthesizers have recently shown their ability
to produce natural-sounding and flexible voices. Unfortunately the delivered
quality suffers from a typical buzziness due to the fact that speech is
vocoded. This paper proposes a new excitation model in order to reduce this
undesirable effect. This model is based on the decomposition of
pitch-synchronous residual frames on an orthonormal basis obtained by Principal
Component Analysis. This basis contains a limited number of eigenresiduals and
is computed on a relatively small speech database. A stream of PCA-based
coefficients is added to our HMM-based synthesizer and allows to generate the
voiced excitation during the synthesis. An improvement compared to the
traditional excitation is reported while the synthesis engine footprint remains
under about 1Mb.
中文翻译:
用于改进参数语音合成的特征残差
统计参数语音合成器最近展示了它们产生自然和灵活声音的能力。不幸的是,由于语音是语音编码的,因此传输质量会受到典型的嗡嗡声的影响。本文提出了一种新的激励模型,以减少这种不良影响。该模型基于在通过主成分分析获得的正交基础上分解音调同步残差帧。该基包含有限数量的特征残差,并且是在相对较小的语音数据库上计算的。基于 PCA 的系数流被添加到我们基于 HMM 的合成器中,并允许在合成过程中生成浊音激励。据报道,与传统激励相比有所改进,而合成引擎足迹保持在约 1Mb 以下。
更新日期:2020-01-06
中文翻译:
用于改进参数语音合成的特征残差
统计参数语音合成器最近展示了它们产生自然和灵活声音的能力。不幸的是,由于语音是语音编码的,因此传输质量会受到典型的嗡嗡声的影响。本文提出了一种新的激励模型,以减少这种不良影响。该模型基于在通过主成分分析获得的正交基础上分解音调同步残差帧。该基包含有限数量的特征残差,并且是在相对较小的语音数据库上计算的。基于 PCA 的系数流被添加到我们基于 HMM 的合成器中,并允许在合成过程中生成浊音激励。据报道,与传统激励相比有所改进,而合成引擎足迹保持在约 1Mb 以下。