A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect
arXiv - CS - Sound Pub Date : 2021-05-07 , DOI: arxiv-2105.03409
Binbin Xu, Chongyang Tao, Zidu Feng, Youssef Raqui, Sylvie Ranwez

This study presents a large scale benchmarking on cloud based Speech-To-Text systems: {Google Cloud Speech-To-Text}, {Microsoft Azure Cognitive Services}, {Amazon Transcribe}, {IBM Watson Speech to Text}. For each systems, 40158 clean and noisy speech files about 101 hours are tested. Effect of background noise on STT quality is also evaluated with 5 different Signal-to-noise ratios from 40dB to 0dB. Results showed that {Microsoft Azure} provided lowest transcription error rate $9.09\%$ on clean speech, with high robustness to noisy environment. {Google Cloud} and {Amazon Transcribe} gave similar performance, but the latter is very limited for time-constraint usage. Though {IBM Watson} could work correctly in quiet conditions, it is highly sensible to noisy speech which could strongly limit its application in real life situations.

中文翻译：

法语语音和背景噪声效果的基于云的语音到文本服务的基准测试

这项研究提出了基于云的语音到文本系统的大规模基准测试：{Google Cloud语音到文本}，{Microsoft Azure认知服务}，{Amazon Transcribe}，{IBM Watson语音到文本}。对于每个系统，将测试约101小时的40158个干净且嘈杂的语音文件。还使用5种不同的信噪比（从40dB到0dB）评估了背景噪声对STT质量的影响。结果表明，{Microsoft Azure}在干净的语音上提供最低的转录错误率$ 9.09 \％$，对嘈杂的环境具有很高的鲁棒性。{Google Cloud}和{Amazon Transcribe}的性能相似，但后者在时间限制方面非常有限。尽管{IBM Watson}可以在安静的环境下正常工作，但是对嘈杂的语音非常敏感，这可能会严重限制其在现实生活中的应用。

更新日期：2021-05-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>