当前位置: X-MOL 学术EURASIP J. Audio Speech Music Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
EURASIP Journal on Audio, Speech, and Music Processing ( IF 2.4 ) Pub Date : 2018-07-11 , DOI: 10.1186/s13636-018-0129-5
Chen-Yu Chiang

In this paper, a novel parametric prosody coding approach for Mandarin speech is proposed. It employs a hierarchical prosodic model (HPM) as a prosody-generating model in the encoder to analyze the speech prosody of the input utterance to obtain a parametric representation of four prosodic-acoustic features of syllable pitch contour, syllable duration, syllable energy level, and syllable-juncture pause duration for encoding. In the decoder, the four prosodic-acoustic features are reconstructed by a synthesis operation using the decoded HPM parameters. The reconstructed prosodic features are lastly used in an HMM-based speech synthesizer to generate the reconstructed speech. Objective and subjective evaluations showed that the proposed prosody coding approach encoded speech with better quality and lower data rate than the conventional segment-based coding scheme with vector or scalar quantization approach did. The reconstructed speech encoded by the proposed approach has good quality at low data rates of 81.4 and 72.7 bps for speaker-dependent and speaker-independent tasks, respectively. An application of the proposed prosody coding approach to speaking rate conversion by directly changing the HPM parameters to those of a different speaking rate is also illustrated. An informal listening test confirmed that both converted speeches of high and low speaking rate sounded very smooth.

中文翻译:

一种基于层次韵律模型的普通话语音参数韵律编码方法

在本文中,提出了一种新的普通话语音参数化韵律编码方法。它采用分层韵律模型(HPM)作为编码器中的韵律生成模型,对输入话语的语音韵律进行分析,得到音节轮廓、音节持续时间、音节能量水平、和音节接合处的编码暂停持续时间。在解码器中,使用解码的 HPM 参数通过合成操作重建四个韵律声学特征。重建的韵律特征最后用于基于 HMM 的语音合成器以生成重建的语音。客观和主观评估表明,所提出的韵律编码方法比采用矢量或标量量化方法的传统基于分段的编码方案编码的语音质量更好,数据速率更低。由所提出的方法编码的重建语音在 81.4 和 72.7 bps 的低数据速率下具有良好的质量,分别适用于说话人相关和独立于说话人的任务。还说明了所提出的韵律编码方法通过直接将 HPM 参数更改为不同语速的参数来进行语速转换的应用。一项非正式的听力测试证实,高语速和低语速的转换后的语音听起来都非常流畅。由所提出的方法编码的重建语音在 81.4 和 72.7 bps 的低数据速率下具有良好的质量,分别适用于说话人相关和独立于说话人的任务。还说明了所提出的韵律编码方法通过直接将 HPM 参数更改为不同语速的参数来进行语速转换的应用。一项非正式的听力测试证实,高语速和低语速的转换后的语音听起来都非常流畅。由所提出的方法编码的重建语音在 81.4 和 72.7 bps 的低数据速率下具有良好的质量,分别适用于说话人相关和独立于说话人的任务。还说明了所提出的韵律编码方法通过直接将 HPM 参数更改为不同语速的参数来进行语速转换的应用。一项非正式的听力测试证实,高语速和低语速的转换后的语音听起来都非常流畅。
更新日期:2018-07-11
down
wechat
bug