当前位置: X-MOL 学术Egypt. Inform. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An intelligent use of stemmer and morphology analysis for Arabic information retrieval
Egyptian Informatics Journal ( IF 5.2 ) Pub Date : 2020-03-07 , DOI: 10.1016/j.eij.2020.02.004
Ali Alnaied , Mosa Elbendak , Abdullah Bulbul

Arabic Information Retrieval has gained significant attention due to an increasing usage of Arabic text on the web and social media networks. This paper discusses a new approach for Arabic stem, called Arabic Morphology Information Retrieval (AMIR), to generate/extract stems by applying a set of rules regarding the relationship among Arabic letters to find the root/stem of the respective words used as indexing terms for the text search in Arabic retrieval systems. To demonstrate the usefulness of the proposed algorithm, we highlight the benefits of the proposed rules for different Arabic information retrieval systems. Finally, we have evaluated AMIR system by comparing its performance with LUCENE, FARASA, and no-stemmer counterpart system in terms of mean average precisions. The results obtained demonstrate that AMIR has achieved a mean average precision of 0.34% while LUCENE, FARASA and no stemmer giving 0.27%, 0.28% and 0.21, respectively. This demonstrates that AMIR is able to improve Arabic stemmer and increases retrieval as well as being strong against any type of stem.



中文翻译:

词干分析和词法分析在阿拉伯信息检索中的智能使用

由于在网络和社交媒体网络上阿拉伯语文本的使用越来越多,阿拉伯语信息检索受到了广泛的关注。本文讨论了一种新的阿拉伯词干方法,称为阿拉伯语形态信息检索(AMIR),它通过应用一组有关阿拉伯字母之间的关系的规则以查找用作索引词的各个词的词根/词干来生成/提取词干用于阿拉伯文检索系统中的文本搜索。为了证明所提出算法的有效性,我们重点介绍了所提出规则对不同阿拉伯信息检索系统的好处。最后,我们通过将AMIR系统与LUCENE,FARASA和无中继配对系统的性能进行比较,以平均平均精度对其进行了评估。获得的结果表明,AMIR的平均平均精度为0.34%,而LUCENE,FARASA和no stemmer的平均精度分别为0.27%,0.28%和0.21。这表明AMIR能够改善阿拉伯词干,增加检索能力,并且能够抵抗任何类型的词干。

更新日期:2020-03-07
down
wechat
bug