当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Extracting Arabic Composite Names Using Genitive Principles of Arabic Grammar
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2020-06-07 , DOI: 10.1145/3382187
Hussein Khalil 1 , Taha Osman 1 , Mohammed Miltan 2
Affiliation  

Named Entity Recognition (NER) is a basic prerequisite of using Natural Language Processing (NLP) for information retrieval. Arabic NER is especially challenging as the language is morphologically rich and has short vowels with no capitalisation convention. This article presents a novel rule-based approach that uses linguistic grammar-based techniques to extract Arabic composite names from Arabic text. Our approach uniquely exploits the genitive Arabic grammar rules; in particular, the rules regarding the identification of definite nouns (معرفة) and indefinite nouns (نكرة) to support the process of extracting composite names. Based on domain knowledge and Arabic Genitive Rules (AGR), the developed approach formalises a set of syntactical rules and linguistic patterns that initially use genitive patterns to classify definiteness within phrases and then extracts proper composite names from the unstructured text. The developed novel approach does not place any constraints on the length of the Arabic composite name and our initial experimentation demonstrated high recall and precision results when the NER algorithm was applied to a financial domain corpus.

中文翻译:

使用阿拉伯语语法的属格原则提取阿拉伯语复合名称

命名实体识别 (NER) 是使用自然语言处理 (NLP) 进行信息检索的基本前提。阿拉伯语 NER 尤其具有挑战性,因为该语言形态丰富且元音短且没有大写约定。本文介绍了一种新颖的基于规则的方法,该方法使用基于语言语法的技术从阿拉伯文本中提取阿拉伯复合名称。我们的方法独特地利用了属格的阿拉伯语语法规则;特别是关于确定名词 (معرفة) 和不定名词 (نكرة) 的识别规则,以支持提取复合名称的过程。基于领域知识和阿拉伯语属格规则(AGR),开发的方法形式化了一组句法规则和语言模式,这些规则和语言模式最初使用属格模式来对短语中的确定性进行分类,然后从非结构化文本中提取适当的复合名称。所开发的新方法对阿拉伯复合名称的长度没有任何限制,当将 NER 算法应用于金融领域语料库时,我们的初始实验证明了高召回率和精度结果。
更新日期:2020-06-07
down
wechat
bug