Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions.,IEEE/ACM Transactions on Audio, Speech, and Language Processing

当前位置： X-MOL 学术 › IEEE ACM Trans. Audio Speech Lang. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions.
IEEE/ACM Transactions on Audio, Speech, and Language Processing ( IF 4.1 ) Pub Date : 2011-01-01 , DOI: 10.1109/tasl.2010.2045180
Philipos C Loizou ₁ , Gibak Kim

Affiliation

Existing speech enhancement algorithms can improve speech quality but not speech intelligibility, and the reasons for that are unclear. In the present paper, we present a theoretical framework that can be used to analyze potential factors that can influence the intelligibility of processed speech. More specifically, this framework focuses on the fine-grain analysis of the distortions introduced by speech enhancement algorithms. It is hypothesized that if these distortions are properly controlled, then large gains in intelligibility can be achieved. To test this hypothesis, intelligibility tests are conducted with human listeners in which we present processed speech with controlled speech distortions. The aim of these tests is to assess the perceptual effect of the various distortions that can be introduced by speech enhancement algorithms on speech intelligibility. Results with three different enhancement algorithms indicated that certain distortions are more detrimental to speech intelligibility degradation than others. When these distortions were properly controlled, however, large gains in intelligibility were obtained by human listeners, even by spectral-subtractive algorithms which are known to degrade speech quality and intelligibility.

中文翻译：

当前语音增强算法无法提高语音清晰度的原因和建议的解决方案。

现有的语音增强算法可以提高语音质量，但不能提高语音清晰度，原因尚不清楚。在本文中，我们提出了一个理论框架，可用于分析可能影响已处理语音的可懂度的潜在因素。更具体地说，该框架侧重于对语音增强算法引入的失真进行细粒度分析。假设如果这些失真得到适当控制，则可以实现可懂度的大幅提高。为了验证这一假设，我们对人类听众进行了可懂度测试，我们在其中呈现具有受控语音失真的处理后的语音。这些测试的目的是评估语音增强算法可能引入的各种失真对语音可懂度的感知影响。三种不同增强算法的结果表明，某些失真比其他失真对语音清晰度下降的危害更大。然而，当这些失真得到适当控制时，即使是通过已知会降低语音质量和可懂度的频谱减法算法，人类听众也可以获得可懂度的大幅提高。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文