当前位置: X-MOL 学术World Wide Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
End-to-end relation extraction based on bootstrapped multi-level distant supervision
World Wide Web ( IF 3.7 ) Pub Date : 2020-04-24 , DOI: 10.1007/s11280-020-00816-9
Ying He , Zhixu Li , Qiang Yang , Zhigang Chen , An Liu , Lei Zhao , Xiaofang Zhou

Distant supervised relation extraction has been widely used to identify new relation facts from free text, since the existence of knowledge base helps these models to build a large dataset with few human intervention and low costs of manpower and time. However, the existing Distant Supervised models are all based on the single-node classifier so that they suffer from the serious false categorization problem especially for the existence of thousands of relations. In this paper, we novelly propose an end-to-end model for relation extraction based on distant supervision. Our model divides the original categorization task into a number of sub-tasks which focus on the construction of tree-like categorization structure in multiple levels. With the tree-like structure, an unlabelled relation instance can be categorized step by step along a path from the root node to a leaf node. An additional benefit of this structure is that it can be used to select negative samples from training data for each child node. In addition, to the best of our knowledge, no effort has been made to update the categorization model with new identified relation facts, which hinders the improvement of the extraction precision and recall. Although bootstrapping methods can contribute to improving the situation, they need additional calculation to evaluate the quality of extracted patterns or tuples when selecting new instances for next iterations. In this paper, we propose to do bootstrapped distant supervision to update the distant supervision model with new learned relation facts iteratively, and we can use scores directly gotten in the model to evaluate the quality of instances instead of additional calculation. As a result, we can further improve the extraction precision and recall. To save the time costs and manpower, we also propose an adaptive method by mapping function to choose the suitable thresholds for each iteration without manual choice rather than using the fixed thresholds. Experimental results conducted on three real datasets prove that our approach outperforms state-of-the-art approaches by reaching 12+% better extraction quality.

中文翻译:

基于自举多级远程监督的端到端关系提取

远程监督关系提取已广泛用于从自由文本中识别新的关系事实,因为知识库的存在帮助这些模型构建了很少人工干预且人力和时间成本较低的大型数据集。但是,现有的远程监督模型都基于单节点分类器,因此它们遭受严重的错误分类问题,尤其是存在数千个关系时。在本文中,我们新颖地提出了一种基于远程监督的关系抽取的端到端模型。我们的模型将原始分类任务划分为多个子任务,这些子任务专注于在多个级别上构建树状分类结构。树状的结构 未标记的关系实例可以按照从根节点到叶节点的路径逐步进行分类。这种结构的另一个好处是,它可以用于从训练数据中为每个子节点选择阴性样本。此外,据我们所知,尚未进行任何努力以新的已识别关联事实来更新分类模型,这阻碍了提取精度和查全率的提高。尽管自举方法可以改善情况,但是当为下一次迭代选择新实例时,它们需要进行额外的计算以评估提取的模式或元组的质量。在本文中,我们建议进行自举远程监控,以迭代方式使用新学到的关联事实更新远程监控模型,我们可以使用直接在模型中获得的分数来评估实例的质量,而无需进行额外的计算。结果,我们可以进一步提高提取精度和查全率。为了节省时间成本和人力,我们还提出了一种自适应方法,该方法通过映射函数来为每次迭代选择合适的阈值,而无需手动选择,而不是使用固定阈值。在三个真实的数据集上进行的实验结果证明,我们的方法通过将提取质量提高12%以上,而优于最新方法。我们还提出了一种通过映射函数来为每次迭代选择合适的阈值的自适应方法,而无需手动选择,而不是使用固定阈值。在三个真实的数据集上进行的实验结果证明,我们的方法的提取质量提高了12%以上,性能优于最新方法。我们还提出了一种通过映射函数来为每次迭代选择合适的阈值的自适应方法,而无需手动选择,而是使用固定阈值。在三个真实的数据集上进行的实验结果证明,我们的方法的提取质量提高了12%以上,性能优于最新方法。
更新日期:2020-04-24
down
wechat
bug