Online Automatic Speech Recognition with Listen, Attend and Spell Model,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Online Automatic Speech Recognition with Listen, Attend and Spell Model
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2020-01-01 , DOI: 10.1109/lsp.2020.3031480
Roger Hsiao , Dogan Can , Tim Ng , Ruchir Travadi , Arnab Ghoshal

The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this letter, we analyze the online operation of LAS models to demonstrate that these limitations stem from the handling of silence regions and the reliability of online attention mechanism at the edge of input buffers. We propose a novel and simple technique that can achieve fully online recognition while meeting accuracy and latency targets. For the Mandarin dictation task, our proposed approach can achieve a character error rate in online operation that is within 4% relative to an offline LAS model. The proposed online LAS model operates at 12% lower latency relative to a conventional neural network hidden Markov model hybrid of comparable accuracy. We have validated the proposed method through a production scale deployment, which, to the best of our knowledge, is the first such deployment of a fully online LAS model.

中文翻译：

具有听、出席和拼写模型的在线自动语音识别

在完全在线模式下运行时，听、出席和拼写 (LAS) 模型和其他基于注意力的自动语音识别 (ASR) 模型具有已知的局限性。在这封信中，我们分析了 LAS 模型的在线操作，以证明这些限制源于对静默区域的处理和输入缓冲区边缘在线注意机制的可靠性。我们提出了一种新颖而简单的技术，可以在满足准确性和延迟目标的同时实现完全在线识别。对于普通话听写任务，我们提出的方法可以实现在线操作中的字符错误率，相对于离线 LAS 模型在 4% 以内。与具有可比精度的传统神经网络隐马尔可夫模型混合相比，所提出的在线 LAS 模型的运行延迟降低了 12%。

更新日期：2020-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11