Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights
arXiv - CS - Sound Pub Date : 2021-06-02 , DOI: arxiv-2106.05852
Devaraja Adiga, Rishabh Kumar, Amrith Krishna, Preethi Jyothi, Ganesh Ramakrishnan, Pawan Goyal

Automatic speech recognition (ASR) in Sanskrit is interesting, owing to the various linguistic peculiarities present in the language. The Sanskrit language is lexically productive, undergoes euphonic assimilation of phones at the word boundaries and exhibits variations in spelling conventions and in pronunciations. In this work, we propose the first large scale study of automatic speech recognition (ASR) in Sanskrit, with an emphasis on the impact of unit selection in Sanskrit ASR. In this work, we release a 78 hour ASR dataset for Sanskrit, which faithfully captures several of the linguistic characteristics expressed by the language. We investigate the role of different acoustic model and language model units in ASR systems for Sanskrit. We also propose a new modelling unit, inspired by the syllable level unit selection, that captures character sequences from one vowel in the word to the next vowel. We also highlight the importance of choosing graphemic representations for Sanskrit and show the impact of this choice on word error rates (WER). Finally, we extend these insights from Sanskrit ASR for building ASR systems in two other Indic languages, Gujarati and Telugu. For both these languages, our experimental results show that the use of phonetic based graphemic representations in ASR results in performance improvements as compared to ASR systems that use native scripts.

中文翻译：

梵语中的自动语音识别：新的语音语料库和建模见解

梵语中的自动语音识别 (ASR) 很有趣，因为该语言存在各种语言特性。梵语语言具有丰富的词汇量，在单词边界处经历音素的悦耳同化，并在拼写约定和发音方面表现出变化。在这项工作中，我们提出了对梵文自动语音识别 (ASR) 的首次大规模研究，重点是梵文 ASR 中单元选择的影响。在这项工作中，我们发布了梵语的 78 小时 ASR 数据集，它忠实地捕捉了该语言表达的几种语言特征。我们研究了不同声学模型和语言模型单元在梵文 ASR 系统中的作用。我们还提出了一个新的建模单元，受到音节级别单元选择的启发，捕获从单词中的一个元音到下一个元音的字符序列。我们还强调了为梵文选择字形表示的重要性，并展示了这种选择对单词错误率 (WER) 的影响。最后，我们扩展了梵文 ASR 的这些见解，以使用其他两种印度语言（古吉拉特语和泰卢固语）构建 ASR 系统。对于这两种语言，我们的实验结果表明，与使用本机脚本的 ASR 系统相比，在 ASR 中使用基于语音的字形表示可以提高性能。我们扩展了梵文 ASR 的这些见解，以使用其他两种印度语言（古吉拉特语和泰卢固语）构建 ASR 系统。对于这两种语言，我们的实验结果表明，与使用本机脚本的 ASR 系统相比，在 ASR 中使用基于语音的字形表示可以提高性能。我们扩展了梵文 ASR 的这些见解，以使用其他两种印度语言（古吉拉特语和泰卢固语）构建 ASR 系统。对于这两种语言，我们的实验结果表明，与使用本机脚本的 ASR 系统相比，在 ASR 中使用基于语音的字形表示可以提高性能。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>