当前位置: X-MOL 学术J. Web Semant. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Knowledge-driven joint posterior revision of named entity classification and linking
Journal of Web Semantics ( IF 2.1 ) Pub Date : 2020-10-10 , DOI: 10.1016/j.websem.2020.100617
Marco Rospocher , Francesco Corcoglioniti

In this work we address the problem of extracting quality entity knowledge from natural language text, an important task for the automatic construction of knowledge graphs from unstructured content.

More in details, we investigate the benefit of performing a joint posterior revision, driven by ontological background knowledge, of the annotations resulting from natural language processing (NLP) entity analyses such as named entity recognition and classification (NERC) and entity linking (EL). The revision is performed via a probabilistic model, called jpark, that given the candidate annotations independently identified by NERC and EL tools on the same textual entity mention, reconsiders the best annotation choice performed by the tools in light of the coherence of the candidate annotations with the ontological knowledge. The model can be explicitly instructed to handle the information that an entity can potentially be NIL (i.e., lacking a corresponding referent in the target linking knowledge base), exploiting it for predicting the best NERC and EL annotation combination.

We present a comprehensive evaluation of jpark along various dimensions, comparing its performances with and without exploiting NIL information, as well as the usage of three different background knowledge resources (YAGO, DBpedia, and Wikidata) to build the model. The evaluation, conducted using different tools (the popular Stanford NER and DBpedia Spotlight, as well as the more recent Flair NER and End-to-End Neural EL) with three reference datasets (AIDA, MEANTIME, and TAC-KBP), empirically confirms the capability of the model to improve the quality of the annotations of the given tools, and thus their performances on the tasks they are designed for.



中文翻译:

知识驱动的命名实体分类和链接的联合后修订

在这项工作中,我们解决了从自然语言文本中提取优质实体知识的问题,这是从非结构化内容自动构建知识图的重要任务。

更详细地讲,我们研究由本体背景知识驱动的联合后验修订对自然语言处理(NLP)实体分析(例如命名实体识别和分类(NERC)和实体链接(EL))产生的注释的好处。通过称为jpark的概率模型执行修订,给定由NERC和EL工具在同一文本实体提及项上独立标识的候选注释,鉴于候选注释与本体知识的一致性,重新考虑了工具执行的最佳注释选择。可以明确指示模型处理实体可能为NIL(即,在目标链接知识库中缺少相应的参照对象)的信息,并利用该模型来预测最佳NERC和EL注释组合。

我们对jpark进行了各个方面的全面评估,比较了在使用和不使用NIL信息的情况下jpark的性能,以及使用三种不同的背景知识资源(YAGO,DBpedia和Wikidata)来构建模型。通过使用不同的工具(流行的斯坦福大学NER和DBpedia Spotlight以及最新的Flair NER和端对端神经EL)与三个参考数据集(AIDA,MEANTIME和TAC-KBP)进行的评估证实了模型提高给定工具的注释质量的能力,从而提高其在设计任务上的性能。

更新日期:2020-10-29
down
wechat
bug