当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bio-SODA: Enabling Natural Language Question Answering over Knowledge Graphs without Training Data
arXiv - CS - Databases Pub Date : 2021-04-28 , DOI: arxiv-2104.13744
Ana Claudia Sima, Tarcisio Mendes de Farias, Maria Anisimova, Christophe Dessimoz, Marc Robinson-Rechavi, Erich Zbinden, Kurt Stockinger

The problem of natural language processing over structured data has become a growing research field, both within the relational database and the Semantic Web community, with significant efforts involved in question answering over knowledge graphs (KGQA). However, many of these approaches are either specifically targeted at open-domain question answering using DBpedia, or require large training datasets to translate a natural language question to SPARQL in order to query the knowledge graph. Hence, these approaches often cannot be applied directly to complex scientific datasets where no prior training data is available. In this paper, we focus on the challenges of natural language processing over knowledge graphs of scientific datasets. In particular, we introduce Bio-SODA, a natural language processing engine that does not require training data in the form of question-answer pairs for generating SPARQL queries. Bio-SODA uses a generic graph-based approach for translating user questions to a ranked list of SPARQL candidate queries. Furthermore, Bio-SODA uses a novel ranking algorithm that includes node centrality as a measure of relevance for selecting the best SPARQL candidate query. Our experiments with real-world datasets across several scientific domains, including the official bioinformatics Question Answering over Linked Data (QALD) challenge, show that Bio-SODA outperforms publicly available KGQA systems by an F1-score of least 20% and by an even higher factor on more complex bioinformatics datasets.

中文翻译:

Bio-SODA:无需训练数据即可通过知识图谱回答自然语言的问题

在关系数据库和语义Web社区中,对结构化数据进行自然语言处理的问题已成为一个正在发展的研究领域,并且在涉及知识图的问题解答(KGQA)方面做出了巨大的努力。但是,这些方法中的许多方法要么专门针对使用DBpedia的开放域问题回答,要么需要大量的训练数据集才能将自然语言问题转换为SPARQL以查询知识图。因此,这些方法通常无法直接应用于没有先前训练数据可用的复杂科学数据集。在本文中,我们将重点放在自然语言处理对科学数据集知识图的挑战上。特别是介绍Bio-SODA,一种自然语言处理引擎,它不需要用于生成SPARQL查询的问题-答案对形式的训练数据。Bio-SODA使用基于图的通用方法将用户问题翻译为SPARQL候选查询的排名列表。此外,Bio-SODA使用一种新颖的排名算法,其中包括节点中心性作为选择最佳SPARQL候选查询的相关性度量。我们对跨多个科学领域的实际数据集进行的实验(包括官方的生物信息学对链接数据的问答)(QALD)挑战表明,Bio-SODA的F1得分至少比公开可用的KGQA系统好20%,甚至更高影响更复杂的生物信息学数据集。Bio-SODA使用基于图的通用方法将用户问题翻译为SPARQL候选查询的排名列表。此外,Bio-SODA使用一种新颖的排名算法,其中包括节点中心性作为选择最佳SPARQL候选查询的相关性度量。我们对跨多个科学领域的实际数据集进行的实验(包括官方的生物信息学对链接数据的问答)(QALD)挑战表明,Bio-SODA的F1得分至少比公开可用的KGQA系统好20%,甚至更高影响更复杂的生物信息学数据集。Bio-SODA使用基于图的通用方法将用户问题翻译为SPARQL候选查询的排名列表。此外,Bio-SODA使用一种新颖的排名算法,其中包括节点中心性作为选择最佳SPARQL候选查询的相关性度量。我们对跨多个科学领域的实际数据集进行的实验(包括官方的生物信息学对链接数据的问答)(QALD)挑战表明,Bio-SODA的F1得分至少比公开可用的KGQA系统好20%,甚至更高影响更复杂的生物信息学数据集。
更新日期:2021-04-29
down
wechat
bug