当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Entity disambiguation with context awareness in user-generated short texts
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2020-06-20 , DOI: 10.1016/j.eswa.2020.113652
Jiaqi Yang , Yongjun Li , Congjie Gao , Wei Dong

Conceptualization is to obtain the most appropriate concepts for noun terms (entities) under different contexts, which plays an important role in human knowledge understanding. However, in natural language, entities are often ambiguous, which creates difficulties in conceptualization. To accurately conceptualize, we must eliminate the ambiguity of entities. Existing methods mainly rely on similar or related entities in context for disambiguation. However, due to the sparsity of user-generated short texts, the number of entities that can be extracted from them is limited. In this paper, we propose an entity disambiguation method, which consists of three steps. (1) Measuring the correlation between terms, which uses both corpus and knowledge information to capture the specific semantic relationship. (2) Selecting informative terms, which considers various types of contextual terms, not just entities, thereby mitigating the effects of text sparsity. (3) Prioritizing informative terms to highlight their discriminative power, which reduces noise interference. Finally, the target entity is disambiguated based on informative terms. Experimental results on ground-truth datasets demonstrate that the proposed method outperforms baseline methods.



中文翻译:

用户生成的短文本中具有上下文意识的实体消歧

概念化是为了获得不同上下文中名词术语(实体)的最合适概念,这在人类知识理解中起着重要作用。但是,在自然语言中,实体常常是模棱两可的,这给概念化带来了困难。为了准确地概念化,我们必须消除实体的歧义。现有方法主要依靠上下文中的相似或相关实体来消除歧义。但是,由于用户生成的短文本的稀疏性,可以从中提取的实体数量有限。本文提出了一种实体消歧方法,该方法包括三个步骤。(1)度量术语之间的相关性,它使用语料库和知识信息来捕获特定的语义关系。(2)选择信息性术语,它考虑了各种类型的上下文术语,而不仅仅是实体,从而减轻了文本稀疏性的影响。(3)优先考虑信息术语以突出其区分能力,从而减少噪声干扰。最后,目标实体根据信息条款消除歧义。真实数据集的实验结果表明,该方法优于基线方法。

更新日期:2020-06-20
down
wechat
bug