当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Tab2Know: Building a Knowledge Base from Tables in Scientific Papers
arXiv - CS - Artificial Intelligence Pub Date : 2021-07-28 , DOI: arxiv-2107.13306
Benno Kruit, Hongyu He, Jacopo Urbani

Tables in scientific papers contain a wealth of valuable knowledge for the scientific enterprise. To help the many of us who frequently consult this type of knowledge, we present Tab2Know, a new end-to-end system to build a Knowledge Base (KB) from tables in scientific papers. Tab2Know addresses the challenge of automatically interpreting the tables in papers and of disambiguating the entities that they contain. To solve these problems, we propose a pipeline that employs both statistical-based classifiers and logic-based reasoning. First, our pipeline applies weakly supervised classifiers to recognize the type of tables and columns, with the help of a data labeling system and an ontology specifically designed for our purpose. Then, logic-based reasoning is used to link equivalent entities (via sameAs links) in different tables. An empirical evaluation of our approach using a corpus of papers in the Computer Science domain has returned satisfactory performance. This suggests that ours is a promising step to create a large-scale KB of scientific knowledge.

中文翻译:

Tab2Know:从科学论文中的表格构建知识库

科学论文中的表格包含对科学事业而言非常宝贵的知识。为了帮助经常查阅此类知识的我们中的许多人,我们展示了 Tab2Know,这是一种新的端到端系统,用于从科学论文中的表格构建知识库 (KB)。Tab2Know 解决了自动解释论文中的表格和消除其中包含的实体歧义的挑战。为了解决这些问题,我们提出了一种采用基于统计的分类器和基于逻辑的推理的管道。首先,我们的管道在数据标记系统和专为我们设计的本体的帮助下,应用弱监督分类器来识别表和列的类型。然后,基于逻辑的推理用于链接不同表中的等效实体(通过 sameAs 链接)。使用计算机科学领域的论文集对我们的方法进行实证评估,结果令人满意。这表明我们是创建大规模科学知识知识库的有希望的一步。
更新日期:2021-07-29
down
wechat
bug