当前位置: X-MOL 学术Speech Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Curriculum Learning based approaches for robust end-to-end far-field speech recognition
Speech Communication ( IF 2.4 ) Pub Date : 2021-06-18 , DOI: 10.1016/j.specom.2021.06.003
Shivesh Ranjan , John H.L. Hansen

Performance of Automatic Speech Recognition (ASR) systems is known to suffer considerable degradation when exposed to Far-Field speech data capture. Consequently, far-field ASR has received considerable attention in recent years. Motivated by our recent work using Curriculum Learning (CL) based strategies to improve Speaker Identification (SID) under noisy and degraded conditions, this study proposes a novel approach to improve far-field ASR using CL based approaches. Specifically, we propose using a CL based approach for training a Bidirectional Long Short Term Memory (BLSTM) based ASR network trained using the Connectionist Temporal Classification (CTC) objective function. We initiate the training with comparatively easier near-field data, and include more diverse (difficult) far-field data progressively in the later stages of training. These proposed approaches are shown to significantly outperform the baseline BLSTM ASR system, and offer relative reductions in WERs of up to +7.3% and +10.1% for the dev and eval sets of the AMI far-field voice capture corpus.



中文翻译:

基于课程学习的稳健端到端远场语音识别方法

众所周知,自动语音识别 (ASR) 系统的性能在暴露于远场语音数据捕获时会遭受相当大的降级。因此,远场ASR近年来受到了相当大的关注。受我们最近使用基于课程学习 (CL) 的策略在嘈杂和退化条件下改进说话人识别 (SID) 的工作的启发,本研究提出了一种使用基于 CL 的方法来改进远场 ASR 的新方法。具体来说,我们建议使用基于 CL 的方法来训练基于双向长短期记忆 (BLSTM) 的 ASR 网络,该网络使用连接主义时间分类 (CTC) 目标函数进行训练。我们使用相对容易的近场数据开始训练,并在训练的后期阶段逐步包含更多样化(困难)的远场数据。

更新日期:2021-07-08
down
wechat
bug