Improving Open Information Extraction with Distant Supervision Learning,Neural Processing Letters

当前位置： X-MOL 学术 › Neural Process Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving Open Information Extraction with Distant Supervision Learning
Neural Processing Letters ( IF 2.6 ) Pub Date : 2021-06-04 , DOI: 10.1007/s11063-021-10548-0
Jiabao Han , Hongzhi Wang

Open information extraction (Open IE), as one of the essential applications in the area of Natural Language Processing (NLP), has gained great attention in recent years. As a critical technology for building Knowledge Bases (KBs), it converts unstructured natural language sentences into structured representations, usually expressed in the form of triples. Most conventional open information extraction approaches leverage a series of manual pre-defined extraction patterns or learn patterns from labeled training examples, which requires a large number of human resources. Additionally, many Natural Language Processing tools are involved, which leads to error accumulation and propagation. With the rapid development of neural networks, neural-based models can minimize the error propagation problem, but it also faces the problem of data-hungry in supervised learning. Especially, they leverage existing Open IE tools to generate training data, and it causes data quality issues. In this paper, we employ a distant supervision learning approach to improve the Open IE task. We conduct extensive experiments by employing two popular sequence-to-sequence models (RNN and Transformer) and a large benchmark data set to demonstrate the performance of our approach.

中文翻译：

使用远程监督学习改进开放信息提取

开放信息提取（Open IE）作为自然语言处理（NLP）领域的重要应用之一，近年来受到了极大的关注。作为构建知识库 (KB) 的一项关键技术，它将非结构化的自然语言句子转换为结构化的表示，通常以三元组的形式表示。大多数传统的开放信息提取方法利用一系列手动预定义的提取模式或从标记的训练示例中学习模式，这需要大量的人力资源。此外，涉及许多自然语言处理工具，这会导致错误累积和传播。随着神经网络的快速发展，基于神经的模型可以最大限度地减少误差传播问题，但它也面临监督学习中数据饥渴的问题。特别是，他们利用现有的 Open IE 工具来生成训练数据，这会导致数据质量问题。在本文中，我们采用远程监督学习方法来改进 Open IE 任务。我们通过使用两个流行的序列到序列模型（RNN 和 Transformer）和一个大型基准数据集来进行广泛的实验，以证明我们方法的性能。

更新日期：2021-06-05

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11