Deformable TDNN with adaptive receptive fields for speech recognition,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deformable TDNN with adaptive receptive fields for speech recognition
arXiv - CS - Sound Pub Date : 2021-04-30 , DOI: arxiv-2104.14791
Keyu An, Yi Zhang, Zhijian Ou

Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems. Nevertheless, the receptive fields of TDNNs are limited and fixed, which is not desirable for tasks like speech recognition, where the temporal dynamics of speech are varied and affected by many factors. This paper proposes to use deformable TDNNs for adaptive temporal dynamics modeling in end-to-end speech recognition. Inspired by deformable ConvNets, deformable TDNNs augment the temporal sampling locations with additional offsets and learn the offsets automatically based on the ASR criterion, without additional supervision. Experiments show that deformable TDNNs obtain state-of-the-art results on WSJ benchmarks (1.42\%/3.45\% WER on WSJ eval92/dev93 respectively), outperforming standard TDNNs significantly. Furthermore, we propose the latency control mechanism for deformable TDNNs, which enables deformable TDNNs to do streaming ASR without accuracy degradation.

中文翻译：

具有自适应接收场的可变形TDNN用于语音识别

时延神经网络（TDNN）广泛用于基于DNN-HMM的混合语音识别系统和最新的端到端系统中。然而，TDNN的接收场是有限的和固定的，这对于诸如语音识别之类的任务是不希望的，在该任务中，语音的时间动态变化并且受许多因素影响。本文提出将可变形TDNN用于端到端语音识别中的自适应时间动力学建模。受到可变形ConvNets的启发，可变形TDNN通过额外的偏移量来扩大时间采样位置，并基于ASR标准自动学习偏移量，而无需额外的监督。实验表明，可变形TDNN在WSJ基准上获得了最新的结果（分别在WSJ eval92 / dev93上为1.42 \％/ 3.45 \％WER），明显优于标准TDNN。

更新日期：2021-05-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文