当前位置: X-MOL 学术Lang. Resour. Eval. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition
Language Resources and Evaluation ( IF 2.7 ) Pub Date : 2020-10-12 , DOI: 10.1007/s10579-020-09505-5
Eiman Alsharhan , Allan Ramsay

Research in Arabic automatic speech recognition (ASR) is constrained by datasets of limited size, and of highly variable content and quality. Arabic-language resources vary in the attributes that affect language resources in other languages (noise, channel, speaker, genre), but also vary significantly in the dialect and level of formality of the spoken Arabic they capture. Many languages suffer similar levels of cross-dialect and cross-register acoustic variability, but these effects have been under-studied. This paper is an experimental analysis of the interaction between classical ASR corpus-compensation methods (feature selection, data selection, gender-dependent acoustic models) and the dialect-dependent/register-dependent variation among Arabic ASR corpora. The first interaction studied in this paper is that between acoustic recording quality and discrete pronunciation variation. Discrete pronunciation variation can be compensated by using grapheme-based instead of phone-based acoustic models, and by filtering out speakers with insufficient training data; the latter technique also helps to compensate for poor recording quality, which is further compensated by eliminating delta-delta acoustic features. All three techniques, together, reduce Word Error Rate (WER) by between 3.24% and 5.35%. The second aspect of dialect and register variation to be considered is variation in the fine-grained acoustic pronunciations of each phoneme in the language. Experimental results prove that gender and dialect are the principal components of variation in speech, therefore, building gender and dialect-specific models leads to substantial decreases in WER. In order to further explore the degree of acoustic differences between phone models required for each of the dialects of Arabic, cross-dialect experiments are conducted to measure how far apart Arabic dialects are acoustically in order to make a better decision about the minimal number of recognition systems needed to cover all dialectal Arabic. Finally, the research addresses an important question: how much training data is needed for building efficient speaker-independent ASR systems? This includes developing some learning curves to find out how large must the training set be to achieve acceptable performance.



中文翻译:

研究性别,方言和培训规模对阿拉伯语语音识别性能的影响

阿拉伯自动语音识别(ASR)的研究受到大小有限,内容和质量高度可变的数据集的限制。阿拉伯语资源的属性有所不同,这些属性会影响其他语言(噪音,频道,说话者,体裁)的语言资源,但所捕获的阿拉伯语的方言和形式水平也存在很大差异。许多语言的交叉方言和交叉配准声变异性都具有相似的水平,但是对这些影响的研究尚未充分研究。本文是对经典ASR语料库补偿方法(功能选择,数据选择,性别相关的声学模型)与阿拉伯ASR语料库中方言/注册相关变量之间相互作用的实验分析。本文研究的第一个交互是录音质量和离散发音变化之间的交互。离散的发音变化可以通过使用基于音素而不是基于电话的声学模型,以及通过过滤出训练数据不足的说话者来补偿。后一种技术还有助于补偿较差的记录质量,这可以通过消除delta-delta声学特征来进一步补偿。所有这三种技术一起将字错误率(WER)降低了3.24%至5.35%。要考虑的方言和配音变化的第二个方面是该语言中每个音素的细粒度声学发音中的变化。实验结果证明,性别和方言是语音变异的主要组成部分,因此,建立性别和方言专用模型会导致WER的大幅下降。为了进一步探讨每种阿拉伯语方言所需的电话模型之间的声学​​差异程度,进行了交叉方言实验以测量阿拉伯方言在声学上的距离,以便对最小识别数做出更好的决策。系统需要覆盖所有方言阿拉伯语。最后,该研究解决了一个重要问题:建立有效的独立于说话者的ASR系统需要多少培训数据?这包括制定一些学习曲线,以找出训练集要达到可接受的性能必须多大。为了进一步探讨每种阿拉伯语方言所需的电话模型之间的声学​​差异程度,进行了交叉方言实验以测量阿拉伯方言在声学上的距离,以便对最小识别数做出更好的决策。系统需要覆盖所有方言阿拉伯语。最后,该研究解决了一个重要问题:建立有效的独立于说话者的ASR系统需要多少培训数据?这包括制定一些学习曲线,以找出训练集要达到可接受的性能必须多大。为了进一步探讨每种阿拉伯语方言所需的电话模型之间的声学​​差异程度,进行了交叉方言实验以测量阿拉伯方言在声学上的距离,以便对最小识别数做出更好的决策。系统需要覆盖所有方言阿拉伯语。最后,该研究解决了一个重要问题:建立有效的独立于说话者的ASR系统需要多少培训数据?这包括制定一些学习曲线,以找出训练集要达到可接受的性能必须多大。该研究解决了一个重要问题:建立有效的独立于说话者的ASR系统需要多少培训数据?这包括制定一些学习曲线,以找出训练集要达到可接受的性能必须多大。该研究解决了一个重要问题:建立有效的独立于说话者的ASR系统需要多少培训数据?这包括制定一些学习曲线,以找出训练集要达到可接受的性能必须多大。

更新日期:2020-10-12
down
wechat
bug