当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speech Quality Factors for Traditional and Neural-Based Low Bit Rate Vocoders
arXiv - CS - Sound Pub Date : 2020-03-26 , DOI: arxiv-2003.11882
Wissam A. Jassim, Jan Skoglund, Michael Chinen, Andrew Hines

This study compares the performances of different algorithms for coding speech at low bit rates. In addition to widely deployed traditional vocoders, a selection of recently developed generative-model-based coders at different bit rates are contrasted. Performance analysis of the coded speech is evaluated for different quality aspects: accuracy of pitch periods estimation, the word error rates for automatic speech recognition, and the influence of speaker gender and coding delays. A number of performance metrics of speech samples taken from a publicly available database were compared with subjective scores. Results from subjective quality assessment do not correlate well with existing full reference speech quality metrics. The results provide valuable insights into aspects of the speech signal that will be used to develop a novel metric to accurately predict speech quality from generative-model-based coders.

中文翻译:

传统和基于神经的低比特率声码器的语音质量因数

本研究比较了在低比特率下编码语音的不同算法的性能。除了广泛部署的传统声码器之外,我们还对比了一些最近开发的不同比特率的基于生成模型的编码器。编码语音的性能分析针对不同的质量方面进行评估:基音周期估计的准确性、自动语音识别的单词错误率以及说话者性别和编码延迟的影响。从公开可用的数据库中提取的语音样本的许多性能指标与主观分数进行了比较。主观质量评估的结果与现有的完整参考语音质量指标没有很好的相关性。
更新日期:2020-03-27
down
wechat
bug