当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gated Recurrent Context: Softmax-free Attention for Online Encoder-Decoder Speech Recognition
arXiv - CS - Sound Pub Date : 2020-07-10 , DOI: arxiv-2007.05214
Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Hyeongju Kim, Nam Soo Kim

Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED models with global attentions are not capable of online inference, various online attention schemes have been developed to reduce ASR latency for better user experience. However, a common limitation of the conventional softmax-based online attention approaches is that they introduce an additional hyperparameter related to the length of the attention window, requiring multiple trials of model training for tuning the hyperparameter. In order to deal with this problem, we propose a novel softmax-free attention method and its modified formulation for online attention, which does not need any additional hyperparameter at the training phase. Through a number of ASR experiments, we demonstrate the tradeoff between the latency and performance of the proposed online attention technique can be controlled by merely adjusting a threshold at the test phase. Furthermore, the proposed methods showed competitive performance to the conventional global and online attentions in terms of word-error-rates (WERs).

中文翻译:

门控循环上下文:用于在线编码器-解码器语音识别的无 Softmax 注意力

最近,基于注意力的编码器-解码器 (AED) 模型在自动语音识别 (ASR) 中表现出最先进的性能。由于原始的全局注意力AED模型不具备在线推理能力,因此开发了各种在线注意力方案来减少ASR延迟以获得更好的用户体验。然而,传统的基于 softmax 的在线注意力方法的一个共同限制是它们引入了一个与注意力窗口长度相关的额外超参数,需要多次试验模型训练来调整超参数。为了解决这个问题,我们提出了一种新的无 softmax 注意力方法及其改进的在线注意力公式,它在训练阶段不需要任何额外的超参数。通过大量的 ASR 实验,我们证明了所提出的在线注意力技术的延迟和性能之间的权衡可以通过仅在测试阶段调整阈值来控制。此外,所提出的方法在字错误率(WER)方面显示出与传统的全局和在线注意力的竞争性能。
更新日期:2020-07-24
down
wechat
bug