当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Training for Speech Recognition on Coprocessors
arXiv - CS - Sound Pub Date : 2020-03-22 , DOI: arxiv-2003.12366
Sebastian Baunsgaard and Sebastian B. Wrede and P{\i}nar Tozun

Automatic Speech Recognition (ASR) has increased in popularity in recent years. The evolution of processor and storage technologies has enabled more advanced ASR mechanisms, fueling the development of virtual assistants such as Amazon Alexa, Apple Siri, Microsoft Cortana, and Google Home. The interest in such assistants, in turn, has amplified the novel developments in ASR research. However, despite this popularity, there has not been a detailed training efficiency analysis of modern ASR systems. This mainly stems from: the proprietary nature of many modern applications that depend on ASR, like the ones listed above; the relatively expensive co-processor hardware that is used to accelerate ASR by big vendors to enable such applications; and the absence of well-established benchmarks. The goal of this paper is to address the latter two of these challenges. The paper first describes an ASR model, based on a deep neural network inspired by recent work in this domain, and our experiences building it. Then we evaluate this model on three CPU-GPU co-processor platforms that represent different budget categories. Our results demonstrate that utilizing hardware acceleration yields good results even without high-end equipment. While the most expensive platform (10X price of the least expensive one) converges to the initial accuracy target 10-30% and 60-70% faster than the other two, the differences among the platforms almost disappear at slightly higher accuracy targets. In addition, our results further highlight both the difficulty of evaluating ASR systems due to the complex, long, and resource intensive nature of the model training in this domain, and the importance of establishing benchmarks for ASR.

中文翻译:

协处理器语音识别训练

近年来,自动语音识别 (ASR) 越来越受欢迎。处理器和存储技术的发展使得更先进的 ASR 机制成为可能,推动了 Amazon Alexa、Apple Siri、Microsoft Cortana 和 Google Home 等虚拟助手的发展。反过来,对此类助手的兴趣又放大了 ASR 研究的新进展。然而,尽管如此受欢迎,但还没有对现代 ASR 系统进行详细的训练效率分析。这主要源于:许多依赖 ASR 的现代应用程序的专有性质,例如上面列出的应用程序;大型供应商用于加速 ASR 以支持此类应用程序的相对昂贵的协处理器硬件;以及缺乏完善的基准。本文的目标是解决后两个挑战。该论文首先描述了一个 ASR 模型,该模型基于受该领域最近工作启发的深度神经网络,以及我们构建它的经验。然后我们在代表不同预算类别的三个 CPU-GPU 协处理器平台上评估这个模型。我们的结果表明,即使没有高端设备,利用硬件加速也能产生良好的结果。虽然最昂贵的平台(价格是最便宜的平台的 10 倍)收敛到初始精度目标的速度比其他两个快 10-30% 和 60-70%,但平台之间的差异在精度目标稍高时几乎消失。此外,我们的结果进一步突出了由于复杂、长、
更新日期:2020-03-30
down
wechat
bug