Character-Aware Attention-Based End-to-End Speech Recognition,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Character-Aware Attention-Based End-to-End Speech Recognition
arXiv - CS - Sound Pub Date : 2020-01-06 , DOI: arxiv-2001.01795
Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong

Predicting words and subword units (WSUs) as the output has shown to be effective for the attention-based encoder-decoder (AED) model in end-to-end speech recognition. However, as one input to the decoder recurrent neural network (RNN), each WSU embedding is learned independently through context and acoustic information in a purely data-driven fashion. Little effort has been made to explicitly model the morphological relationships among WSUs. In this work, we propose a novel character-aware (CA) AED model in which each WSU embedding is computed by summarizing the embeddings of its constituent characters using a CA-RNN. This WSU-independent CA-RNN is jointly trained with the encoder, the decoder and the attention network of a conventional AED to predict WSUs. With CA-AED, the embeddings of morphologically similar WSUs are naturally and directly correlated through the CA-RNN in addition to the semantic and acoustic relations modeled by a traditional AED. Moreover, CA-AED significantly reduces the model parameters in a traditional AED by replacing the large pool of WSU embeddings with a much smaller set of character embeddings. On a 3400 hours Microsoft Cortana dataset, CA-AED achieves up to 11.9% relative WER improvement over a strong AED baseline with 27.1% fewer model parameters.

中文翻译：

基于字符感知注意力的端到端语音识别

预测单词和子词单元 (WSU) 作为输出已证明对于端到端语音识别中基于注意力的编码器 - 解码器 (AED) 模型是有效的。然而，作为解码器循环神经网络 (RNN) 的一个输入，每个 WSU 嵌入都是通过上下文和声学信息以纯数据驱动的方式独立学习的。对 WSU 之间的形态关系进行明确建模的努力很少。在这项工作中，我们提出了一种新颖的字符感知 (CA) AED 模型，其中每个 WSU 嵌入是通过使用 CA-RNN 总结其组成字符的嵌入来计算的。这种独立于 WSU 的 CA-RNN 与传统 AED 的编码器、解码器和注意力网络联合训练，以预测 WSU。使用 CA-AED，除了由传统 AED 建模的语义和声学关系之外，形态相似的 WSU 的嵌入通过 CA-RNN 自然而直接地相关。此外，CA-AED 通过用更小的字符嵌入集替换大型 WSU 嵌入池，显着减少了传统 AED 中的模型参数。在 3400 小时的 Microsoft Cortana 数据集上，CA-AED 相对于强大的 AED 基线实现了高达 11.9% 的相对 WER 改进，模型参数减少了 27.1%。

更新日期：2020-01-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文