Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition
arXiv - CS - Sound Pub Date : 2020-01-06 , DOI: arxiv-2001.01798
Zhong Meng, Jinyu Li, Yashesh Gaur, Yifan Gong

Teacher-student (T/S) has shown to be effective for domain adaptation of deep neural network acoustic models in hybrid speech recognition systems. In this work, we extend the T/S learning to large-scale unsupervised domain adaptation of an attention-based end-to-end (E2E) model through two levels of knowledge transfer: teacher's token posteriors as soft labels and one-best predictions as decoder guidance. To further improve T/S learning with the help of ground-truth labels, we propose adaptive T/S (AT/S) learning. Instead of conditionally choosing from either the teacher's soft token posteriors or the one-hot ground-truth label, in AT/S, the student always learns from both the teacher and the ground truth with a pair of adaptive weights assigned to the soft and one-hot labels quantifying the confidence on each of the knowledge sources. The confidence scores are dynamically estimated at each decoder step as a function of the soft and one-hot labels. With 3400 hours parallel close-talk and far-field Microsoft Cortana data for domain adaptation, T/S and AT/S achieve 6.3% and 10.3% relative word error rate improvement over a strong E2E model trained with the same amount of far-field data.

中文翻译：

通过师生学习实现端到端语音识别的领域适应

师生 (T/S) 已被证明对于混合语音识别系统中深度神经网络声学模型的域适应是有效的。在这项工作中，我们通过两个级别的知识转移将 T/S 学习扩展到基于注意力的端到端 (E2E) 模型的大规模无监督域适应：教师标记后验作为软标签和最佳预测作为解码器指导。为了在真实标签的帮助下进一步改进 T/S 学习，我们提出了自适应 T/S (AT/S) 学习。在 AT/S 中，学生不是有条件地从教师的软标记后验或 one-hot 真实标签中进行选择，而是始终从教师和真实标签中学习，并使用一对分配给 soft 和 one 的自适应权重来学习。 - 热标签量化每个知识源的置信度。作为软标签和单热标签的函数，在每个解码器步骤中动态估计置信度分数。使用 3400 小时并行近距离和远场 Microsoft Cortana 数据进行域自适应，T/S 和 AT/S 相对于使用相同数量远场训练的强大 E2E 模型实现了 6.3% 和 10.3% 的相对字错误率改进数据。

更新日期：2020-01-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文