当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Full-Reference Speech Quality Estimation with Attentional Siamese Neural Networks
arXiv - CS - Sound Pub Date : 2021-05-03 , DOI: arxiv-2105.00783 Gabriel Mittags, Sebastian Möller
arXiv - CS - Sound Pub Date : 2021-05-03 , DOI: arxiv-2105.00783 Gabriel Mittags, Sebastian Möller
In this paper, we present a full-reference speech quality prediction model
with a deep learning approach. The model determines a feature representation of
the reference and the degraded signal through a siamese recurrent convolutional
network that shares the weights for both signals as input. The resulting
features are then used to align the signals with an attention mechanism and are
finally combined to estimate the overall speech quality. The proposed network
architecture represents a simple solution for the time-alignment problem that
occurs for speech signals transmitted through Voice-Over-IP networks and shows
how the clean reference signal can be incorporated into speech quality models
that are based on end-to-end trained neural networks.
中文翻译:
注意力暹罗神经网络的全参考语音质量估计
在本文中,我们提出了一种具有深度学习方法的全参考语音质量预测模型。该模型通过暹罗循环卷积网络确定参考和降级信号的特征表示,该网络共享两个信号的权重作为输入。然后,将生成的特征用于将信号与注意力机制对齐,并最终进行组合以估计总体语音质量。提出的网络体系结构代表了针对通过IP语音网络传输的语音信号发生的时间对齐问题的简单解决方案,并展示了如何将纯净参考信号合并到基于端到端的语音质量模型中训练有素的神经网络。
更新日期:2021-05-04
中文翻译:
注意力暹罗神经网络的全参考语音质量估计
在本文中,我们提出了一种具有深度学习方法的全参考语音质量预测模型。该模型通过暹罗循环卷积网络确定参考和降级信号的特征表示,该网络共享两个信号的权重作为输入。然后,将生成的特征用于将信号与注意力机制对齐,并最终进行组合以估计总体语音质量。提出的网络体系结构代表了针对通过IP语音网络传输的语音信号发生的时间对齐问题的简单解决方案,并展示了如何将纯净参考信号合并到基于端到端的语音质量模型中训练有素的神经网络。