当前位置: X-MOL 学术arXiv.cs.MA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Enhanced Corpus for Arabic Newspapers Comments
arXiv - CS - Multiagent Systems Pub Date : 2021-02-08 , DOI: arxiv-2102.09965
Hichem RahabTECHNÉ - EA 6316, Abdelhafid ZitouniTECHNÉ - EA 6316, Mahieddine DjoudiTECHNÉ - EA 6316

In this paper, we propose our enhanced approach to create a dedicated corpus for Algerian Arabic newspapers comments. The developed approach has to enhance an existing approach by the enrichment of the available corpus and the inclusion of the annotation step by following the Model Annotate Train Test Evaluate Revise (MATTER) approach. A corpus is created by collecting comments from web sites of three well know Algerian newspapers. Three classifiers, support vector machines, na{\"i}ve Bayes, and k-nearest neighbors, were used for classification of comments into positive and negative classes. To identify the influence of the stemming in the obtained results, the classification was tested with and without stemming. Obtained results show that stemming does not enhance considerably the classification due to the nature of Algerian comments tied to Algerian Arabic Dialect. The promising results constitute a motivation for us to improve our approach especially in dealing with non Arabic sentences, especially Dialectal and French ones.

中文翻译:

阿拉伯语报纸的增强语料库评论

在本文中,我们提出了一种增强的方法来为阿尔及利亚阿拉伯报纸的评论创建专门的语料库。已开发的方法必须通过遵循模型注释训练测试评估修订(MATTER)方法,通过丰富可用语料库和包含注释步骤来增强现有方法。通过收集来自三个阿尔及利亚知名报纸的网站的评论来创建语料库。使用三个分类器,支持向量机,朴素贝叶斯和k最近邻,将注释分为正类和负类,为确定词干对所得结果的影响,对分类进行了测试有无梗。所得结果表明,由于与阿尔及利亚阿拉伯方言有关的阿尔及利亚评论的性质,词干不能显着提高分类。令人鼓舞的结果构成了我们改进方法的动力,尤其是在处理非阿拉伯句子,尤其是方言和法语句子方面。
更新日期:2021-02-22
down
wechat
bug