FarsBase-KBP: A knowledge base population system for the Persian Knowledge Graph,Journal of Web Semantics

当前位置： X-MOL 学术 › J. Web Semant. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

FarsBase-KBP: A knowledge base population system for the Persian Knowledge Graph
Journal of Web Semantics ( IF 2.5 ) Pub Date : 2021-03-24 , DOI: 10.1016/j.websem.2021.100638
Majid Asgari-Bidhendi , Behrooz Janfada , Behrouz Minaei-Bidgoli

While most of the knowledge bases already support the English language, there is only one knowledge base for the Persian language, known as FarsBase, which is automatically created via semi-structured web information. Unlike English knowledge bases such as Wikidata, which have tremendous community support, the population of a knowledge base like FarsBase must rely on automatically extracted knowledge. Knowledge base population can let FarsBase keep growing in size, as the system continues working. In this paper, we present a knowledge base population system for the Persian language, which extracts knowledge from unlabelled raw text, crawled from the Web. The proposed system consists of a set of state-of-the-art modules such as an entity linking module as well as information and relation extraction modules designed for FarsBase. Moreover, a canonicalization system is introduced to link extracted relations to FarsBase properties. Then, the system uses knowledge fusion techniques with minimal intervention of human experts to integrate and filter the proper knowledge instances, extracted by each module. To evaluate the performance of the presented knowledge base population system, we present the first gold dataset for benchmarking knowledge base population in the Persian language, which consisting of 22015 FarsBase triples and verified by human experts. The evaluation results demonstrate the efficiency of the proposed system.

中文翻译：

FarsBase-KBP：波斯知识图的知识库填充系统

尽管大多数知识库已经支持英语，但是只有一种波斯语知识库，称为FarsBase，它是通过半结构化Web信息自动创建的。与像Wikidata这样具有大量社区支持的英语知识库不同，诸如FarsBase之类的知识库必须依赖于自动提取的知识。随着系统的继续运行，知识库人口可以使FarsBase的规模不断增长。在本文中，我们提供了波斯语的知识库填充系统，该系统从未爬网的未标记原始文本中提取知识。拟议的系统包括一组最新的模块，例如实体链接模块以及为FarsBase设计的信息和关系提取模块。而且，引入规范化系统以将提取的关系链接到FarsBase属性。然后，该系统使用知识融合技术，而无需人工干预，以集成和过滤每个模块提取的适当知识实例。为了评估所介绍的知识库人口系统的性能，我们展示了第一个以波斯语语言为基准的知识库数据的黄金数据集，该数据集由22015 FarsBase三元组组成，并经过人类专家的验证。评估结果证明了该系统的有效性。为了评估现有知识库人口系统的性能，我们提供了第一个以波斯语为基准的知识库黄金数据集，该数据集由22015 FarsBase三元组组成，并经过了人类专家的验证。评估结果证明了该系统的有效性。为了评估所介绍的知识库人口系统的性能，我们展示了第一个以波斯语语言为基准的知识库数据的黄金数据集，该数据集由22015 FarsBase三元组组成，并经过人类专家的验证。评估结果证明了该系统的有效性。

更新日期：2021-04-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>