Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks
arXiv - CS - Sound Pub Date : 2021-05-03 , DOI: arxiv-2105.00783
Gabriel Mittags, Sebastian Möller

In this paper, we present a full-reference speech quality prediction model with a deep learning approach. The model determines a feature representation of the reference and the degraded signal through a siamese recurrent convolutional network that shares the weights for both signals as input. The resulting features are then used to align the signals with an attention mechanism and are finally combined to estimate the overall speech quality. The proposed network architecture represents a simple solution for the time-alignment problem that occurs for speech signals transmitted through Voice-Over-IP networks and shows how the clean reference signal can be incorporated into speech quality models that are based on end-to-end trained neural networks.

中文翻译：

注意力暹罗神经网络的全参考语音质量估计

在本文中，我们提出了一种具有深度学习方法的全参考语音质量预测模型。该模型通过暹罗循环卷积网络确定参考和降级信号的特征表示，该网络共享两个信号的权重作为输入。然后，将生成的特征用于将信号与注意力机制对齐，并最终进行组合以估计总体语音质量。提出的网络体系结构代表了针对通过IP语音网络传输的语音信号发生的时间对齐问题的简单解决方案，并展示了如何将纯净参考信号合并到基于端到端的语音质量模型中训练有素的神经网络。

更新日期：2021-05-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文