当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk
arXiv - CS - Sound Pub Date : 2021-07-15 , DOI: arxiv-2107.07471
Amir Ivry, Israel Cohen, Baruch Berdugo

Human subjective evaluation is optimal to assess speech quality for human perception. The recently introduced deep noise suppression mean opinion score (DNSMOS) metric was shown to estimate human ratings with great accuracy. The signal-to-distortion ratio (SDR) metric is widely used to evaluate residual-echo suppression (RES) systems by estimating speech quality during double-talk. However, since the SDR is affected by both speech distortion and residual-echo presence, it does not correlate well with human ratings according to the DNSMOS. To address that, we introduce two objective metrics to separately quantify the desired-speech maintained level (DSML) and residual-echo suppression level (RESL) during double-talk. These metrics are evaluated using a deep learning-based RES-system with a tunable design parameter. Using 280 hours of real and simulated recordings, we show that the DSML and RESL correlate well with the DNSMOS with high generalization to various setups. Also, we empirically investigate the relation between tuning the RES-system design parameter and the DSML-RESL tradeoff it creates and offer a practical design scheme for dynamic system requirements.

中文翻译:

评估双方通话期间残余回声抑制的客观指标

人类主观评价最适合评估人类感知的语音质量。最近引入的深度噪声抑制平均意见得分 (DNSMOS) 指标被证明可以非常准确地估计人类评分。信号失真比 (SDR) 度量被广泛用于通过估计双方通话期间的语音质量来评估残余回声抑制 (RES) 系统。然而,由于 SDR 受语音失真和残留回声存在的影响,根据 DNSMOS,它与人类评级的相关性不高。为了解决这个问题,我们引入了两个客观指标来分别量化双方通话期间的期望语音保持水平(DSML)和残余回声抑制水平(RESL)。这些指标是使用具有可调设计参数的基于深度学习的 RES 系统进行评估的。使用 280 小时的真实和模拟录音,我们表明 DSML 和 RESL 与 DNSMOS 相关性良好,对各种设置具有高度泛化性。此外,我们根据经验研究了调整 RES 系统设计参数与其创建的 DSML-RESL 权衡之间的关系,并为动态系统要求提供了实用的设计方案。
更新日期:2021-07-16
down
wechat
bug