当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Error-driven Fixed-Budget ASR Personalization for Accented Speakers
arXiv - CS - Sound Pub Date : 2021-03-04 , DOI: arxiv-2103.03142
Abhijeet Awasthi, Aman Kansal, Sunita Sarawagi, Preethi Jyothi

We consider the task of personalizing ASR models while being constrained by a fixed budget on recording speaker-specific utterances. Given a speaker and an ASR model, we propose a method of identifying sentences for which the speaker's utterances are likely to be harder for the given ASR model to recognize. We assume a tiny amount of speaker-specific data to learn phoneme-level error models which help us select such sentences. We show that speaker's utterances on the sentences selected using our error model indeed have larger error rates when compared to speaker's utterances on randomly selected sentences. We find that fine-tuning the ASR model on the sentence utterances selected with the help of error models yield higher WER improvements in comparison to fine-tuning on an equal number of randomly selected sentence utterances. Thus, our method provides an efficient way of collecting speaker utterances under budget constraints for personalizing ASR models.

中文翻译:

由错误驱动的,有发言权的固定预算ASR个性化设置

我们考虑个性化ASR模型的任务,同时要受固定预算来记录特定于说话者的话语的限制。给定说话者和ASR模型,我们提出了一种识别句子的方法,对于给定的ASR模型来说,说话者的话语可能更难识别。我们假设有少量特定于说话者的数据来学习音素级错误模型,这有助于我们选择此类句子。我们显示,与随机选择的句子中说话者的说话相比,使用我们的错误模型选择的句子中说话者的说话确实有较大的错误率。我们发现,与在相等数量的随机选择的句子发音上进行微调相比,借助错误模型对选择的句子表达上的ASR模型进行微调可以产生更高的WER改进。因此,
更新日期:2021-03-05
down
wechat
bug