当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Building a morpho-semantic knowledge graph for Arabic information retrieval
Information Processing & Management ( IF 8.6 ) Pub Date : 2019-09-25 , DOI: 10.1016/j.ipm.2019.102124
Ibrahim Bounhas , Nadia Soudani , Yahya Slimani

In this paper, we propose to build a morpho-semantic knowledge graph from Arabic vocalized corpora. Our work focuses on classical Arabic as it has not been deeply investigated in related works. We use a tool suite which allows analyzing and disambiguating Arabic texts, taking into account short diacritics to reduce ambiguities. At the morphological level, we combine Ghwanmeh stemmer and MADAMIRA which are adapted to extract a multi-level lexicon from Arabic vocalized corpora. At the semantic level, we infer semantic dependencies between tokens by exploiting contextual knowledge extracted by a concordancer. Both morphological and semantic links are represented through compressed graphs, which are accessed through lazy methods. These graphs are mined using a measure inspired from BM25 to compute one-to-many similarity. Indeed, we propose to evaluate the morpho-semantic Knowledge Graph in the context of Arabic Information Retrieval (IR). Several scenarios of document indexing and query expansion are assessed. That is, we vary indexing units for Arabic IR based on different levels of morphological knowledge, a challenging issue which is not yet resolved in previous research. We also experiment several combinations of morpho-semantic query expansion. This permits to validate our resource and to study its impact on IR based on state-of-the art evaluation metrics.



中文翻译:

建立用于阿拉伯信息检索的形态语义知识图

在本文中,我们建议从阿拉伯语发声语料库构建形态语义知识图。我们的工作集中在古典阿拉伯语上,因为相关作品尚未对此进行深入研究。我们使用一个工具套件,该工具套件可以考虑到简短的变音符号来减少歧义,从而分析和消除阿拉伯文本的歧义。在形态学方面,我们结合了Ghwanmeh词干和MADAMIRA,它们适用于从阿拉伯语发声语料库中提取多层次词典。在语义级别,我们通过利用协调者提取的上下文知识来推断标记之间的语义依赖性。形态和语义链接均通过压缩图表示,可通过惰性方法访问。使用受BM25启发的度量来挖掘这些图,以计算一对多相似性。确实,我们建议在阿拉伯语信息检索(IR)的背景下评估形态语义知识图。评估了文档索引编制和查询扩展的几种方案。也就是说,我们根据形态学知识的不同水平来更改阿拉伯语IR的索引单位,这是一个有挑战性的问题,以前的研究尚未解决。我们还实验了形态语义查询扩展的几种组合。这样可以验证我们的资源,并根据最新的评估指标来研究其对IR的影响。一个具有挑战性的问题,以前的研究尚未解决。我们还实验了形态语义查询扩展的几种组合。这样可以验证我们的资源,并根据最新的评估指标来研究其对IR的影响。一个具有挑战性的问题,以前的研究尚未解决。我们还实验了形态语义查询扩展的几种组合。这样可以验证我们的资源,并根据最新的评估指标来研究其对IR的影响。

更新日期:2020-04-21
down
wechat
bug