Automatic accent identification as an analytical tool for accent robust automatic speech recognition,Speech Communication

当前位置： X-MOL 学术 › Speech Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic accent identification as an analytical tool for accent robust automatic speech recognition
Speech Communication ( IF 3.2 ) Pub Date : 2020-06-04 , DOI: 10.1016/j.specom.2020.05.003
Maryam Najafian , Martin Russell

We present a novel study of relationships between automatic accent identification (AID) and accent-robust automatic speech recognition (ASR), using i-vector based AID and deep neural network, hidden Markov Model (DNN-HMM) based ASR. A visualization of the AID i-vector space and a novel analysis of the accent content of the WSJCAM0 corpus are presented. Accents that occur at the periphery of AID space are referred to as “extreme”. We demonstrate a negative correlation, with respect to accent, between AID and ASR accuracy, where extreme accents exhibit the highest AID and lowest ASR performance. These relationships between accents inform a set of ASR experiments in which a generic training set (WSJCAM0) is supplemented with a fixed amount of accented data from the ABI (Accents of the British Isles) corpus. The best performance across all accents, a 32% relative reduction in errors compared with the baseline ASR system, is obtained when the supplementary data comprises extreme accented speech, even though this accent accounts for just 14% of the test data. We conclude that i-vector based AID analysis provides a principled approach to the selection of training material for accent robust ASR. We speculate that this may generalize to other detection technologies and other types of variability, such as Speaker Identification (SI) and speaker variability.

中文翻译：

自动口音识别作为用于增强口音的自动语音识别的分析工具

我们使用基于i-vector的AID和深度神经网络，基于隐马尔可夫模型（DNN-HMM）的ASR，对自动重音识别（AID）与重音自动语音识别（ASR）之间的关系进行了新颖的研究。呈现了AID i向量空间的可视化以及WSJCAM0语料库的重音内容的新颖分析。出现在AID空间外围的口音被称为“极端”。我们证明了AID和ASR准确性之间在重音方面呈负相关，其中极端的重音表现出最高的AID和最低的ASR性能。口音之间的这些关系通知了一组ASR实验，在该实验中，通用训练集（WSJCAM0）补充了来自ABI（不列颠群岛口音）语料库的固定数量的口音数据。在所有口音中表现最佳，当补充数据包含极重语音时，与基准ASR系统相比，错误相对减少了32％，即使该重语音仅占测试数据的14％。我们得出的结论是，基于i-vector的AID分析提供了一种原则性的方法来选择口音健壮的ASR训练材料。我们推测这可能会推广到其他检测技术和其他类型的可变性，例如说话人识别（SI）和说话人可变性。

更新日期：2020-06-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>