当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A baseline model for computationally inexpensive speech recognition for Kazakh using the Coqui STT framework
arXiv - CS - Sound Pub Date : 2021-07-19 , DOI: arxiv-2107.10637
Ilnar Salimzianov

Mobile devices are transforming the way people interact with computers, and speech interfaces to applications are ever more important. Automatic Speech Recognition systems recently published are very accurate, but often require powerful machinery (specialised Graphical Processing Units) for inference, which makes them impractical to run on commodity devices, especially in streaming mode. Impressed by the accuracy of, but dissatisfied with the inference times of the baseline Kazakh ASR model of (Khassanov et al.,2021) when not using a GPU, we trained a new baseline acoustic model (on the same dataset as the aforementioned paper) and three language models for use with the Coqui STT framework. Results look promising, but further epochs of training and parameter sweeping or, alternatively, limiting the vocabulary that the ASR system must support, is needed to reach a production-level accuracy.

中文翻译:

使用 Coqui STT 框架的哈萨克语计算成本低廉的语音识别基线模型

移动设备正在改变人们与计算机交互的方式,应用程序的语音接口变得越来越重要。最近发布的自动语音识别系统非常准确,但通常需要强大的机器(专门的图形处理单元)进行推理,这使得它们在商用设备上运行不切实际,尤其是在流模式下。对 (Khassanov et al.,2021) 的基线哈萨克 ASR 模型在不使用 GPU 时的准确性印象深刻,但不满意,我们训练了一个新的基线声学模型(在与上述论文相同的数据集上)以及用于 Coqui STT 框架的三种语言模型。结果看起来很有希望,但进一步训练和参数扫描,或者限制 ASR 系统必须支持的词汇,
更新日期:2021-07-23
down
wechat
bug