Generating knowledge graphs by employing Natural Language Processing and Machine Learning techniques within the scholarly domain,Future Generation Computer Systems

当前位置： X-MOL 学术 › Future Gener. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generating knowledge graphs by employing Natural Language Processing and Machine Learning techniques within the scholarly domain
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2020-10-31 , DOI: 10.1016/j.future.2020.10.026
Danilo Dessì , Francesco Osborne , Diego Reforgiato Recupero , Davide Buscaldi , Enrico Motta

The continuous growth of scientific literature brings innovations and, at the same time, raises new challenges. One of them is related to the fact that its analysis has become difficult due to the high volume of published papers for which manual effort for annotations and management is required. Novel technological infrastructures are needed to help researchers, research policy makers, and companies to time-efficiently browse, analyse, and forecast scientific research. Knowledge graphs i.e., large networks of entities and relationships, have proved to be effective solution in this space. Scientific knowledge graphs focus on the scholarly domain and typically contain metadata describing research publications such as authors, venues, organizations, research topics, and citations. However, the current generation of knowledge graphs lacks of an explicit representation of the knowledge presented in the research papers. As such, in this paper, we present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications and integrates them in a large-scale knowledge graph. Within this research work, we (i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools, (ii) describe an approach for integrating entities and relationships generated by these tools, (iii) show the advantage of such an hybrid system over alternative approaches, and (vi) as a chosen use case, we generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain. As our approach is general and can be applied to any domain, we expect that it can facilitate the management, analysis, dissemination, and processing of scientific knowledge.

中文翻译：

通过在学术领域内采用自然语言处理和机器学习技术来生成知识图

科学文献的不断发展带来了创新，同时也提出了新的挑战。其中之一与以下事实有关：由于大量发表的论文需要进行人工注释和管理，因此其分析变得困难。需要新颖的技术基础架构来帮助研究人员，研究政策制定者和公司高效地浏览，分析和预测科学研究。事实证明，知识图（即，大型实体和关系网络）是该空间中的有效解决方案。科学知识图集中在学术领域，通常包含描述研究出版物的元数据，例如作者，会场，组织，研究主题和引用。然而，当前一代的知识图缺乏研究论文中所呈现知识的明确表示。因此，在本文中，我们提出了一种新的体系结构，该体系结构利用自然语言处理和机器学习方法从研究出版物中提取实体和关系并将其集成到大规模知识图中。在这项研究工作中，我们（i）通过使用几种最先进的自然语言处理和文本挖掘工具来应对知识提取的挑战，（ii）描述一种整合这些工具生成的实体和关系的方法，（ iii）展示了这种混合系统相对于替代方法的优势，并且（vi）作为选定的用例，我们生成了一个科学知识图，其中包括109105个三元组，该图形是从26个中提取的，语义网域。由于我们的方法是通用的并且可以应用于任何领域，因此我们希望它可以促进科学知识的管理，分析，传播和处理。

更新日期：2020-11-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文