当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Siamese Graph Neural Networks for Data Integration
arXiv - CS - Databases Pub Date : 2020-01-17 , DOI: arxiv-2001.06543
Evgeny Krivosheev, Mattia Atzeni, Katsiaryna Mirylenka, Paolo Scotton, Fabio Casati

Data integration has been studied extensively for decades and approached from different angles. However, this domain still remains largely rule-driven and lacks universal automation. Recent development in machine learning and in particular deep learning has opened the way to more general and more efficient solutions to data integration problems. In this work, we propose a general approach to modeling and integrating entities from structured data, such as relational databases, as well as unstructured sources, such as free text from news articles. Our approach is designed to explicitly model and leverage relations between entities, thereby using all available information and preserving as much context as possible. This is achieved by combining siamese and graph neural networks to propagate information between connected entities and support high scalability. We evaluate our method on the task of integrating data about business entities, and we demonstrate that it outperforms standard rule-based systems, as well as other deep learning approaches that do not use graph-based representations.

中文翻译:

用于数据集成的连体图神经网络

数十年来,人们对数据集成进行了广泛的研究,并从不同的角度进行了研究。然而,这个领域仍然主要是规则驱动的,缺乏通用的自动化。机器学习,特别是深度学习的最新发展为更通用、更有效的数据集成问题解决方案开辟了道路。在这项工作中,我们提出了一种从结构化数据(例如关系数据库)以及非结构化数据源(例如新闻文章中的自由文本)中建模和集成实体的通用方法。我们的方法旨在显式建模和利用实体之间的关系,从而使用所有可用信息并尽可能多地保留上下文。这是通过结合 siamese 和图神经网络在连接实体之间传播信息并支持高可扩展性来实现的。我们在集成业务实体数据的任务上评估我们的方法,并证明它优于标准的基于规则的系统,以及其他不使用基于图的表示的深度学习方法。
更新日期:2020-01-22
down
wechat
bug