当前位置: X-MOL 学术Proc. IEEE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Far-Field Automatic Speech Recognition
Proceedings of the IEEE ( IF 20.6 ) Pub Date : 2021-02-01 , DOI: 10.1109/jproc.2020.3018668
Reinhold Haeb-Umbach , Jahn Heymann , Lukas Drude , Shinji Watanabe , Marc Delcroix , Tomohiro Nakatani

The machine recognition of speech spoken at a distance from the microphones, known as far-field automatic speech recognition (ASR), has received a significant increase in attention in science and industry, which caused or was caused by an equally significant improvement in recognition accuracy. Meanwhile, it has entered the consumer market with digital home assistants with a spoken language interface being its most prominent application. Speech recorded at a distance is affected by various acoustic distortions, and consequently, quite different processing pipelines have emerged compared with ASR for close-talk speech. A signal enhancement front end for dereverberation, source separation, and acoustic beamforming is employed to clean up the speech, and the back-end ASR engine is robustified by multicondition training and adaptation. We will also describe the so-called end-to-end approach to ASR, which is a new promising architecture that has recently been extended to the far-field scenario. This tutorial article gives an account of the algorithms used to enable accurate speech recognition from a distance, and it will be seen that, although deep learning has a significant share in the technological breakthroughs, a clever combination with traditional signal processing can lead to surprisingly effective solutions.

中文翻译:

远场自动语音识别

对远离麦克风说话的语音进行机器识别,称为远场自动语音识别 (ASR),在科学和工业界受到了显着增加的关注,这导致或已经导致识别准确度的同样显着提高. 同时,它以口语界面为最突出的应用的数字家庭助理进入了消费市场。远距离记录的语音会受到各种声学失真的影响,因此,与 ASR 相比,出现了非常不同的处理管道,用于近距离语音。用于去混响、源分离和声学波束成形的信号增强前端用于清理语音,后端 ASR 引擎通过多条件训练和适应得到增强。我们还将描述所谓的 ASR 端到端方法,这是一种新的有前途的架构,最近已扩展到远场场景。本教程文章介绍了用于实现远距离准确语音识别的算法,从中可以看出,尽管深度学习在技术突破中占有重要份额,但与传统信号处理的巧妙结合可以带来惊人的效果解决方案。
更新日期:2021-02-01
down
wechat
bug