当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Framework for Indonesian Grammar Error Correction
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 1.8 ) Pub Date : 2021-05-26 , DOI: 10.1145/3440993
Nankai Lin 1 , Boyu Chen 1 , Xiaotian Lin 1 , Kanoksak Wattanachote 1 , Shengyi Jiang 1
Affiliation  

Grammatical Error Correction (GEC) is a challenge in Natural Language Processing research. Although many researchers have been focusing on GEC in universal languages such as English or Chinese, few studies focus on Indonesian, which is a low-resource language. In this article, we proposed a GEC framework that has the potential to be a baseline method for Indonesian GEC tasks. This framework treats GEC as a multi-classification task. It integrates different language embedding models and deep learning models to correct 10 types of Part of Speech (POS) error in Indonesian text. In addition, we constructed an Indonesian corpus that can be utilized as an evaluation dataset for Indonesian GEC research. Our framework was evaluated on this dataset. Results showed that the Long Short-Term Memory model based on word-embedding achieved the best performance. Its overall macro-average F 0.5 in correcting 10 POS error types reached 0.551. Results also showed that the framework can be trained on a low-resource dataset.

中文翻译:

印尼文语法纠错框架

语法纠错 (GEC) 是自然语言处理研究中的一项挑战。尽管许多研究人员一直关注英语或中文等通用语言的 GEC,但很少有研究关注资源匮乏的印度尼西亚语。在本文中,我们提出了一个 GEC 框架,该框架有可能成为印度尼西亚 GEC 任务的基线方法。该框架将 GEC 视为多分类任务。它集成了不同的语言嵌入模型和深度学习模型来纠正印度尼西亚文本中的 10 种词性 (POS) 错误。此外,我们构建了一个印度尼西亚语料库,可用作印度尼西亚 GEC 研究的评估数据集。我们的框架在这​​个数据集上进行了评估。结果表明,基于词嵌入的长短期记忆模型取得了最佳性能。0.5在纠正 10 种 POS 错误类型中达到 0.551。结果还表明,该框架可以在低资源数据集上进行训练。
更新日期:2021-05-26
down
wechat
bug