当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect
arXiv - CS - Sound Pub Date : 2021-05-07 , DOI: arxiv-2105.03409
Binbin Xu, Chongyang Tao, Zidu Feng, Youssef Raqui, Sylvie Ranwez

This study presents a large scale benchmarking on cloud based Speech-To-Text systems: {Google Cloud Speech-To-Text}, {Microsoft Azure Cognitive Services}, {Amazon Transcribe}, {IBM Watson Speech to Text}. For each systems, 40158 clean and noisy speech files about 101 hours are tested. Effect of background noise on STT quality is also evaluated with 5 different Signal-to-noise ratios from 40dB to 0dB. Results showed that {Microsoft Azure} provided lowest transcription error rate $9.09\%$ on clean speech, with high robustness to noisy environment. {Google Cloud} and {Amazon Transcribe} gave similar performance, but the latter is very limited for time-constraint usage. Though {IBM Watson} could work correctly in quiet conditions, it is highly sensible to noisy speech which could strongly limit its application in real life situations.

中文翻译:

法语语音和背景噪声效果的基于云的语音到文本服务的基准测试

这项研究提出了基于云的语音到文本系统的大规模基准测试:{Google Cloud语音到文本},{Microsoft Azure认知服务},{Amazon Transcribe},{IBM Watson语音到文本}。对于每个系统,将测试约101小时的40158个干净且嘈杂的语音文件。还使用5种不同的信噪比(从40dB到0dB)评估了背景噪声对STT质量的影响。结果表明,{Microsoft Azure}在干净的语音上提供最低的转录错误率$ 9.09 \%$,对嘈杂的环境具有很高的鲁棒性。{Google Cloud}和{Amazon Transcribe}的性能相似,但后者在时间限制方面非常有限。尽管{IBM Watson}可以在安静的环境下正常工作,但是对嘈杂的语音非常敏感,这可能会严重限制其在现实生活中的应用。
更新日期:2021-05-10
down
wechat
bug