Automatic Error Type Annotation for Arabic,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic Error Type Annotation for Arabic
arXiv - CS - Computation and Language Pub Date : 2021-09-16 , DOI: arxiv-2109.08068
Riadh Belkebir, Nizar Habash

We present ARETA, an automatic error type annotation system for Modern Standard Arabic. We design ARETA to address Arabic's morphological richness and orthographic ambiguity. We base our error taxonomy on the Arabic Learner Corpus (ALC) Error Tagset with some modifications. ARETA achieves a performance of 85.8% (micro average F1 score) on a manually annotated blind test portion of ALC. We also demonstrate ARETA's usability by applying it to a number of submissions from the QALB 2014 shared task for Arabic grammatical error correction. The resulting analyses give helpful insights on the strengths and weaknesses of different submissions, which is more useful than the opaque M2 scoring metrics used in the shared task. ARETA employs a large Arabic morphological analyzer, but is completely unsupervised otherwise. We make ARETA publicly available.

中文翻译：

阿拉伯语的自动错误类型注释

我们展示了 ARETA，这是一种现代标准阿拉伯语的自动错误类型注释系统。我们设计 ARETA 来解决阿拉伯语的形态丰富性和拼写歧义。我们的错误分类法基于阿拉伯语学习者语料库 (ALC) 错误标记集，并进行了一些修改。ARETA 在 ALC 的手动注释盲测部分取得了 85.8%（微观平均 F1 分数）的性能。我们还通过将 ARETA 应用于 QALB 2014 阿拉伯语语法错误纠正共享任务的许多提交来展示 ARETA 的可用性。由此产生的分析对不同提交的优势和劣势提供了有用的见解，这比共享任务中使用的不透明的 M2 评分指标更有用。ARETA 采用大型阿拉伯语形态分析器，但在其他方面完全不受监督。

更新日期：2021-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文