当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models
arXiv - CS - Sound Pub Date : 2021-05-03 , DOI: arxiv-2105.01134
Coleman Hooper, Thierry Tambe, Gu-Yeon Wei

This work analyzes how attention-based Bidirectional Long Short-Term Memory (BLSTM) models adapt to noise-augmented speech. We identify crucial components for noise adaptation in BLSTM models by freezing model components during fine-tuning. We first freeze larger model subnetworks and then pursue a fine-grained freezing approach in the encoder after identifying its importance for noise adaptation. The first encoder layer is shown to be crucial for noise adaptation, and the weights are shown to be more important than the other layers. Appreciable accuracy benefits are identified when fine-tuning on a target noisy environment from a model pretrained with noisy speech relative to fine-tuning from a model pretrained with only clean speech when tested on the target noisy environment. For this analysis, we produce our own dataset augmentation tool and it is open-sourced to encourage future efforts in exploring noise adaptation in ASR.

中文翻译:

量化和最大化基于注意的语音识别模型上的后端噪声适应的好处

这项工作分析了基于注意力的双向长期短期记忆(BLSTM)模型如何适应增强了语音的语音。通过在微调过程中冻结模型组件,我们确定了BLSTM模型中噪声适应的关键组件。我们先冻结较大的模型子网,然后在确定编码器对噪声适应的重要性之后,在编码器中采用细粒度的冻结方法。示出了第一编码器层对于噪声适应至关重要,并且示出了权重比其他层更重要。当在目标嘈杂环境中进行测试时,相对于仅对纯净语音进行过预训练的模型进行微调,相对于在仅经过嘈杂语音进行预训练的模型中对目标嘈杂环境进行微调时,可以识别出明显的准确性。对于此分析,
更新日期:2021-05-05
down
wechat
bug