KARI: KAnari/QCRI's End-to-End systems for the INTERSPEECH 2021 Indian Languages Code-Switching Challenge,arXiv - CS - Human-Computer Interaction

当前位置： X-MOL 学术 › arXiv.cs.HC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

KARI: KAnari/QCRI's End-to-End systems for the INTERSPEECH 2021 Indian Languages Code-Switching Challenge
arXiv - CS - Human-Computer Interaction Pub Date : 2021-06-10 , DOI: arxiv-2106.05885
Amir Hussein, Shammur Chowdhury, Ahmed Ali

In this paper, we present the Kanari/QCRI (KARI) system and the modeling strategies used to participate in the Interspeech 2021 Code-switching (CS) challenge for low-resource Indian languages. The subtask involved developing a speech recognition system for two CS datasets: Hindi-English and Bengali-English, collected in a real-life scenario. To tackle the CS challenges, we use transfer learning for incorporating the publicly available monolingual Hindi, Bengali, and English speech data. In this work, we study the effectiveness of two steps transfer learning protocol for low-resourced CS data: monolingual pretraining, followed by fine-tuning. For acoustic modeling, we develop an end-to-end convolution-augmented transformer (Conformer). We show that selecting the percentage of each monolingual data affects model biases towards using one language character set over the other in a CS scenario. The models pretrained on well-aligned and accurate monolingual data showed robustness against misalignment between the segments and the transcription. Finally, we develop word-level n-gram language models (LM) to rescore ASR recognition.

中文翻译：

KARI：KAnari/QCRI 用于 INTERSPEECH 2021 印度语言代码转换挑战的端到端系统

在本文中，我们介绍了 Kanari/QCRI (KARI) 系统以及用于参与针对低资源印度语言的 Interspeech 2021 代码切换 (CS) 挑战的建模策略。该子任务涉及为两个 CS 数据集开发语音识别系统：印地语-英语和孟加拉语-英语，收集在现实生活场景中。为了应对 CS 挑战，我们使用迁移学习来整合公开可用的单语印地语、孟加拉语和英语语音数据。在这项工作中，我们研究了两步迁移学习协议对低资源 CS 数据的有效性：单语预训练，然后是微调。对于声学建模，我们开发了端到端的卷积增强转换器（Conformer）。我们表明，选择每个单语数据的百分比会影响模型在 CS 场景中使用一种语言字符集而不是另一种语言字符集的偏差。在对齐良好且准确的单语数据上进行预训练的模型显示出对片段和转录之间错位的鲁棒性。最后，我们开发了词级 n-gram 语言模型 (LM) 来重新评分 ASR 识别。

更新日期：2021-06-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文