当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Arabic Diacritic Recovery Using a Feature-rich biLSTM Model
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 1.8 ) Pub Date : 2021-04-15 , DOI: 10.1145/3434235
Kareem Darwish 1 , Ahmed Abdelali 1 , Hamdy Mubarak 1 , Mohamed Eldesouki 1
Affiliation  

Diacritics (short vowels) are typically omitted when writing Arabic text, and readers have to reintroduce them to correctly pronounce words. There are two types of Arabic diacritics: The first are core-word diacritics (CW), which specify the lexical selection, and the second are case endings (CE), which typically appear at the end of word stems and generally specify their syntactic roles. Recovering CEs is relatively harder than recovering core-word diacritics due to inter-word dependencies, which are often distant. In this article, we use feature-rich recurrent neural network model that use a variety of linguistic and surface-level features to recover both core word diacritics and case endings. Our model surpasses all previous state-of-the-art systems with a CW error rate (CWER) of 2.9% and a CE error rate (CEER) of 3.7% for Modern Standard Arabic (MSA) and CWER of 2.2% and CEER of 2.5% for Classical Arabic (CA). When combining diacritized word cores with case endings, the resultant word error rates are 6.0% and 4.3% for MSA and CA, respectively. This highlights the effectiveness of feature engineering for such deep neural models.

中文翻译:

使用功能丰富的 biLSTM 模型恢复阿拉伯语变音符号

写阿拉伯语文本时通常会省略变音符号(短元音),读者必须重新引入它们才能正确发音。有两种类型的阿拉伯语变音符号:第一种是核心词变音符号(CW),它指定词汇选择,第二种是大小写结尾(CE)​​,它通常出现在词干的末尾并通常指定它们的句法作用. 由于单词间的依赖关系,恢复 CE 比恢复核心单词变音符号相对困难,这通常是遥远的。在本文中,我们使用特征丰富的循环神经网络模型,该模型使用各种语言和表面级特征来恢复核心词变音符号和格结尾。我们的模型以 2.9% 的 CW 错误率 (CWER) 和 3 的 CE 错误率 (CEER) 超越了所有以前最先进的系统。现代标准阿拉伯语 (MSA) 为 7%,古典阿拉伯语 (CA) 的 CWER 为 2.2%,CEER 为 2.5%。当将变音词核心与格结尾结合时,MSA 和 CA 的结果词错误率分别为 6.0% 和 4.3%。这突出了特征工程对这种深度神经模型的有效性。
更新日期:2021-04-15
down
wechat
bug