Speech Decomposition Based on a Hybrid Speech Model and Optimal Segmentation,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Speech Decomposition Based on a Hybrid Speech Model and Optimal Segmentation
arXiv - CS - Sound Pub Date : 2021-05-04 , DOI: arxiv-2105.01302
Alfredo Esquivel Jaramillo, Jesper Kjær Nielsen, Mads Græsbøll Christensen

In a hybrid speech model, both voiced and unvoiced components can coexist in a segment. Often, the voiced speech is regarded as the deterministic component, and the unvoiced speech and additive noise are the stochastic components. Typically, the speech signal is considered stationary within fixed segments of 20-40 ms, but the degree of stationarity varies over time. For decomposing noisy speech into its voiced and unvoiced components, a fixed segmentation may be too crude, and we here propose to adapt the segment length according to the signal local characteristics. The segmentation relies on parameter estimates of a hybrid speech model and the maximum a posteriori (MAP) and log-likelihood criteria as rules for model selection among the possible segment lengths, for voiced and unvoiced speech, respectively. Given the optimal segmentation markers and the estimated statistics, both components are estimated using linear filtering. A codebook-based approach differentiates between unvoiced speech and noise. A better extraction of the components is possible by taking into account the adaptive segmentation, compared to a fixed one. Also, a lower distortion for voiced speech and higher segSNR for both components is possible, as compared to other decomposition methods.

中文翻译：

基于混合语音模型和最优分割的语音分解

在混合语音模型中，有声和无声成分可以共存于一个片段中。通常，浊音被视为确定性成分，清音和加性噪声是随机成分。通常，语音信号被认为在20-40 ms的固定段内是固定的，但平稳程度会随时间而变化。为了将嘈杂的语音分解为有声和无声的分量，固定的分段可能太粗糙了，我们在此建议根据信号的局部特征来调整分段长度。分割依赖于混合语音模型的参数估计以及最大后验（MAP）和对数似然准则，作为分别针对有声和无声语音在可能的片段长度中进行模型选择的规则。给定最佳分割标记和估计的统计量，使用线性滤波估计两个分量。基于码本的方法可以区分清音和噪声。与固定分割相比，通过考虑自适应分割，可以更好地提取组件。此外，与其他分解方法相比，浊音的失真较低，两个分量的segSNR较高。

更新日期：2021-05-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>