当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Deep Learning for Arabic Error Detection and Correction
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 1.8 ) Pub Date : 2020-07-07 , DOI: 10.1145/3373266
Manar Alkhatib 1 , Azza Abdel Monem 2 , Khaled Shaalan 1
Affiliation  

Research on tools for automating the proofreading of Arabic text has received much attention in recent years. There is an increasing demand for applications that can detect and correct Arabic spelling and grammatical errors to improve the quality of Arabic text content and application input. Our review of previous studies indicates that few Arabic spell-checking research efforts appropriately address the detection and correction of ill-formed words that do not conform to the Arabic morphology system. Even fewer systems address the detection and correction of erroneous well-formed Arabic words that are either contextually or semantically inconsistent within the text. We introduce an approach that investigates employing deep neural network technology for error detection in Arabic text. We have developed a systematic framework for spelling and grammar error detection, as well as correction at the word level, based on a bidirectional long short-term memory mechanism and word embedding, in which a polynomial network classifier is at the top of the system. To get conclusive results, we have developed the most significant gold standard annotated corpus to date, containing 15 million fully inflected Arabic words. The data were collected from diverse text sources and genres, in which every erroneous and ill-formed word has been annotated, validated, and manually revised by Arabic specialists. This valuable asset is available for the Arabic natural language processing research community. The experimental results confirm that our proposed system significantly outperforms the performance of Microsoft Word 2013 and Open Office Ayaspell 3.4, which have been used in the literature for evaluating similar research.

中文翻译:

用于阿拉伯语错误检测和纠正的深度学习

近年来,对阿拉伯语文本自动校对工具的研究备受关注。对能够检测和纠正阿拉伯语拼写和语法错误以提高阿拉伯语文本内容和应用程序输入质量的应用程序的需求不断增加。我们对以往研究的回顾表明,很少有阿拉伯语拼写检查研究工作能够适当地检测和纠正不符合阿拉伯语形态系统的格式错误的单词。更少的系统能够检测和纠正文本中在上下文或语义上不一致的错误格式正确的阿拉伯语单词。我们介绍了一种研究使用深度神经网络技术在阿拉伯语文本中进行错误检测的方法。我们基于双向长短期记忆机制和词嵌入开发了一个用于拼写和语法错误检测以及词级纠正的系统框架,其中多项式网络分类器位于系统顶部。为了获得确凿的结果,我们开发了迄今为止最重要的黄金标准注释语料库,其中包含 1500 万个完全变形的阿拉伯语单词。这些数据是从不同的文本来源和类型中收集的,其中每个错误和格式错误的单词都经过了阿拉伯专家的注释、验证和手动修改。这项宝贵的资产可供阿拉伯语自然语言处理研究社区使用。实验结果证实,我们提出的系统显着优于 Microsoft Word 2013 和 Open Office Ayaspell 3 的性能。
更新日期:2020-07-07
down
wechat
bug