Controlling the Remixing of Separated Dialogue with a Non-Intrusive Quality Estimate,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Controlling the Remixing of Separated Dialogue with a Non-Intrusive Quality Estimate
arXiv - CS - Sound Pub Date : 2021-07-21 , DOI: arxiv-2107.10151
Matteo Torcoli, Jouni Paulus, Thorsten Kastner, Christian Uhle

Remixing separated audio sources trades off interferer attenuation against the amount of audible deteriorations. This paper proposes a non-intrusive audio quality estimation method for controlling this trade-off in a signal-adaptive manner. The recently proposed 2f-model is adopted as the underlying quality measure, since it has been shown to correlate strongly with basic audio quality in source separation. An alternative operation mode of the measure is proposed, more appropriate when considering material with long inactive periods of the target source. The 2f-model requires the reference target source as an input, but this is not available in many applications. Deep neural networks (DNNs) are trained to estimate the 2f-model intrusively using the reference target (iDNN2f), non-intrusively using the input mix as reference (nDNN2f), and reference-free using only the separated output signal (rDNN2f). It is shown that iDNN2f achieves very strong correlation with the original measure on the test data (Pearson r=0.99), while performance decreases for nDNN2f (r>=0.91) and rDNN2f (r>=0.82). The non-intrusive estimate nDNN2f is mapped to select item-dependent remixing gains with the aim of maximizing the interferer attenuation under a constraint on the minimum quality of the remixed output (e.g., audible but not annoying deteriorations). A listening test shows that this is successfully achieved even with very different selected gains (up to 23 dB difference).

中文翻译：

使用非侵入式质量估计控制分离对话的重新混合

重新混合分离的音频源会在干扰衰减与可听衰减量之间进行权衡。本文提出了一种非侵入式音频质量估计方法，用于以信号自适应方式控制这种权衡。最近提出的 2f 模型被用作基础质量度量，因为它已被证明与源分离中的基本音频质量密切相关。提出了该措施的另一种操作模式，当考虑目标源长时间不活动的材料时更合适。2f 模型需要参考目标源作为输入，但这在许多应用程序中不可用。训练深度神经网络 (DNN) 以使用参考目标 (iDNN2f) 侵入式估计 2f 模型，非侵入式地使用输入混合作为参考 (nDNN2f)，并且仅使用分离的输出信号 (rDNN2f) 进行无参考。结果表明，iDNN2f 与测试数据的原始度量（Pearson r=0.99）实现了非常强的相关性，而 nDNN2f（r>=0.91）和 rDNN2f（r>=0.82）的性能下降。非侵入性估计 nDNN2f 被映射到选择依赖于项目的再混合增益，目的是在对再混合输出的最小质量（例如，可听但不烦人的恶化）的约束下最大化干扰衰减。一项聆听测试表明，即使选择的增益非常不同（差异高达 23 dB），也能成功实现这一目标。 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91) 和 rDNN2f (r>=0.82)。非侵入性估计 nDNN2f 被映射到选择依赖于项目的再混合增益，目的是在对再混合输出的最小质量（例如，可听但不烦人的恶化）的约束下最大化干扰衰减。一项聆听测试表明，即使选择的增益非常不同（差异高达 23 dB），也能成功实现这一目标。 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91) 和 rDNN2f (r>=0.82)。非侵入性估计 nDNN2f 被映射到选择依赖于项目的再混合增益，目的是在对再混合输出的最小质量（例如，可听但不烦人的恶化）的约束下最大化干扰衰减。一项聆听测试表明，即使选择的增益非常不同（差异高达 23 dB），也能成功实现这一目标。 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

更新日期：2021-07-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文