当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detection of Lexical Stress Errors in Non-native (L2) English with Data Augmentation and Attention
arXiv - CS - Sound Pub Date : 2020-12-29 , DOI: arxiv-2012.14788
Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as syllable nucleus. We propose an attention-based deep learning model that automatically derives optimal syllable-level representation from frame-level and phoneme-level audio features. Training this model is challenging because of the limited amount of incorrect stress patterns. To solve this problem, we propose to augment the training set with incorrectly stressed words generated with Neural TTS. Combining both techniques achieves 94.8\% precision and 49.2\% recall for the detection of incorrectly stressed words in L2 English speech of Slavic speakers.

中文翻译:

通过数据增强和注意力检测非母语(L2)英语中的词汇重音错误

本文介绍了两种新颖的补充技术,这些技术可改进非母语(L2)英语语音中词汇重音错误的检测:基于注意力的特征提取和基于神经文本语音转换(TTS)的数据增强。在经典方法中,通常从固定的语音区域(如音节核)提取音频特征。我们提出了一种基于注意力的深度学习模型,该模型会自动从帧级和音素级音频特征中得出最佳音节级表示。由于错误的压力模式数量有限,因此训练该模型具有挑战性。为了解决这个问题,我们建议使用Neural TTS生成的带有不正确重音的单词来扩充训练集。结合这两种技术,可以达到94.8%的精度和49。
更新日期:2021-01-01
down
wechat
bug