当前位置:
X-MOL 学术
›
arXiv.eess.SP
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET
arXiv - EE - Signal Processing Pub Date : 2022-09-23 , DOI: arxiv-2209.11666 Arijit Biswas, Guanxin Jiang
arXiv - EE - Signal Processing Pub Date : 2022-09-23 , DOI: arxiv-2209.11666 Arijit Biswas, Guanxin Jiang
Automatic coded audio quality predictors are typically designed for
evaluating single channels without considering any spatial aspects. With
InSE-NET [1], we demonstrated mimicking a state-of-the-art coded audio quality
metric (ViSQOL-v3 [2]) with deep neural networks (DNN) and subsequently
improving it - completely with programmatically generated data. In this study,
we take steps towards building a DNN-based coded stereo audio quality predictor
and we propose an extension of the InSE-NET for handling stereo signals. The
design considers stereo/spatial aspects by conditioning the model with left,
right, mid, and side channels; and we name our model Stereo InSE-NET. By
transferring selected weights from the pre-trained mono InSE-NET and retraining
with both real and synthetically augmented listening tests, we demonstrate a
significant improvement of 12% and 6% of Pearson and Spearman Rank correlation
coefficient, respectively, over the latest ViSQOL-v3 [3].
中文翻译:
Stereo InSE-NET:从 Mono InSE-NET 学习的立体声音频质量预测器传输
自动编码音频质量预测器通常设计用于评估单个通道而不考虑任何空间方面。借助 InSE-NET [1],我们展示了使用深度神经网络 (DNN) 模仿最先进的编码音频质量指标 (ViSQOL-v3 [2]) 并随后对其进行改进 - 完全使用以编程方式生成的数据。在这项研究中,我们采取措施构建基于 DNN 的编码立体声音频质量预测器,并且我们提出了 InSE-NET 的扩展以处理立体声信号。该设计通过使用左、右、中、侧通道调节模型来考虑立体声/空间方面;我们将模型命名为 Stereo InSE-NET。通过从预先训练的单声道 InSE-NET 中转移选定的权重,并通过真实和综合增强的听力测试进行重新训练,
更新日期:2022-09-26
中文翻译:
Stereo InSE-NET:从 Mono InSE-NET 学习的立体声音频质量预测器传输
自动编码音频质量预测器通常设计用于评估单个通道而不考虑任何空间方面。借助 InSE-NET [1],我们展示了使用深度神经网络 (DNN) 模仿最先进的编码音频质量指标 (ViSQOL-v3 [2]) 并随后对其进行改进 - 完全使用以编程方式生成的数据。在这项研究中,我们采取措施构建基于 DNN 的编码立体声音频质量预测器,并且我们提出了 InSE-NET 的扩展以处理立体声信号。该设计通过使用左、右、中、侧通道调节模型来考虑立体声/空间方面;我们将模型命名为 Stereo InSE-NET。通过从预先训练的单声道 InSE-NET 中转移选定的权重,并通过真实和综合增强的听力测试进行重新训练,