当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Deterministic plus Stochastic Model of the Residual Signal and its Applications
arXiv - CS - Sound Pub Date : 2019-12-29 , DOI: arxiv-2001.01000
Thomas Drugman, Thierry Dutoit

The modeling of speech production often relies on a source-filter approach. Although methods parameterizing the filter have nowadays reached a certain maturity, there is still a lot to be gained for several speech processing applications in finding an appropriate excitation model. This manuscript presents a Deterministic plus Stochastic Model (DSM) of the residual signal. The DSM consists of two contributions acting in two distinct spectral bands delimited by a maximum voiced frequency. Both components are extracted from an analysis performed on a speaker-dependent dataset of pitch-synchronous residual frames. The deterministic part models the low-frequency contents and arises from an orthonormal decomposition of these frames. As for the stochastic component, it is a high-frequency noise modulated both in time and frequency. Some interesting phonetic and computational properties of the DSM are also highlighted. The applicability of the DSM in two fields of speech processing is then studied. First, it is shown that incorporating the DSM vocoder in HMM-based speech synthesis enhances the delivered quality. The proposed approach turns out to significantly outperform the traditional pulse excitation and provides a quality equivalent to STRAIGHT. In a second application, the potential of glottal signatures derived from the proposed DSM is investigated for speaker identification purpose. Interestingly, these signatures are shown to lead to better recognition rates than other glottal-based methods.

中文翻译:

残差信号的确定性加随机模型及其应用

语音生成的建模通常依赖于源过滤器方法。尽管现在参数化滤波器的方法已经达到了一定的成熟度,但在寻找合适的激励模型方面,一些语音处理应用仍有很多需要改进的地方。这份手稿提出了残差信号的确定性加随机模型 (DSM)。DSM 由两个贡献组成,它们作用于由最大话音频率定界的两个不同频谱带。这两个组件都是从对音高同步残差帧的扬声器相关数据集执行的分析中提取的。确定性部分对低频内容进行建模,并产生于这些帧的正交分解。至于随机分量,它是一种在时间和频率上都经过调制的高频噪声。还强调了 DSM 的一些有趣的语音和计算特性。然后研究了 DSM 在语音处理的两个领域中的适用性。首先,它表明在基于 HMM 的语音合成中结合 DSM 声码器可以提高交付质量。结果表明,所提出的方法明显优于传统的脉冲激励,并提供了与 STRAIGHT 等效的质量。在第二个应用中,为了说话人识别的目的,研究了从提议的 DSM 派生的声门签名的潜力。有趣的是,这些签名显示出比其他基于声门的方法更好的识别率。结果表明,在基于 HMM 的语音合成中加入 DSM 声码器可以提高传输质​​量。结果表明,所提出的方法明显优于传统的脉冲激励,并提供了与 STRAIGHT 等效的质量。在第二个应用中,为了说话人识别的目的,研究了从提议的 DSM 派生的声门签名的潜力。有趣的是,这些签名显示出比其他基于声门的方法更好的识别率。结果表明,在基于 HMM 的语音合成中加入 DSM 声码器可以提高传输质​​量。结果表明,所提出的方法明显优于传统的脉冲激励,并提供了与 STRAIGHT 等效的质量。在第二个应用中,为了说话人识别的目的,研究了从提议的 DSM 派生的声门签名的潜力。有趣的是,这些签名显示出比其他基于声门的方法更好的识别率。出于说话人识别的目的,研究了从提议的 DSM 派生的声门签名的潜力。有趣的是,这些签名显示出比其他基于声门的方法更好的识别率。出于说话人识别的目的,研究了从提议的 DSM 派生的声门签名的潜力。有趣的是,这些签名显示出比其他基于声门的方法更好的识别率。
更新日期:2020-01-07
down
wechat
bug