当前位置: X-MOL 学术arXiv.cs.DB › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CorDEL: A Contrastive Deep Learning Approach for Entity Linkage
arXiv - CS - Databases Pub Date : 2020-09-15 , DOI: arxiv-2009.07203
Zhengyang Wang, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Shuiwang Ji

Entity linkage (EL) is a critical problem in data cleaning and integration. In the past several decades, EL has typically been done by rule-based systems or traditional machine learning models with hand-curated features, both of which heavily depend on manual human inputs. With the ever-increasing growth of new data, deep learning (DL) based approaches have been proposed to alleviate the high cost of EL associated with the traditional models. Existing exploration of DL models for EL strictly follows the well-known twin-network architecture. However, we argue that the twin-network architecture is sub-optimal to EL, leading to inherent drawbacks of existing models. In order to address the drawbacks, we propose a novel and generic contrastive DL framework for EL. The proposed framework is able to capture both syntactic and semantic matching signals and pays attention to subtle but critical differences. Based on the framework, we develop a contrastive DL approach for EL, called CorDEL, with three powerful variants. We evaluate CorDEL with extensive experiments conducted on both public benchmark datasets and a real-world dataset. CorDEL outperforms previous state-of-the-art models by 5.2% on public benchmark datasets. Moreover, CorDEL yields a 2.4% improvement over the current best DL model on the real-world dataset, while reducing the number of training parameters by 97.6%.

中文翻译:

CorDEL:实体链接的对比深度学习方法

实体链接 (EL) 是数据清理和集成中的关键问题。在过去的几十年里,EL 通常是由基于规则的系统或具有手工策划特征的传统机器学习模型完成的,两者都严重依赖人工输入。随着新数据的不断增长,人们提出了基于深度学习 (DL) 的方法来减轻与传统模型相关的 EL 的高成本。现有的 EL DL 模型探索严格遵循众所周知的双网络架构。然而,我们认为双网络架构对 EL 来说是次优的,导致现有模型的固有缺陷。为了解决这些缺点,我们为 EL 提出了一种新颖且通用的对比 DL 框架。所提出的框架能够捕获句法和语义匹配信号,并关注细微但关键的差异。基于该框架,我们为 EL 开发了一种对比 DL 方法,称为 CorDEL,具有三个强大的变体。我们通过在公共基准数据集和真实数据集上进行的大量实验来评估 CorDEL。CorDEL 在公共基准数据集上的性能比以前的最先进模型高 5.2%。此外,CorDEL 在现实世界数据集上比当前最好的 DL 模型提高了 2.4%,同时将训练参数的数量减少了 97.6%。我们通过在公共基准数据集和真实数据集上进行的大量实验来评估 CorDEL。CorDEL 在公共基准数据集上的性能比以前的最先进模型高 5.2%。此外,CorDEL 在现实世界数据集上比当前最好的 DL 模型提高了 2.4%,同时将训练参数的数量减少了 97.6%。我们通过在公共基准数据集和真实数据集上进行的大量实验来评估 CorDEL。CorDEL 在公共基准数据集上的性能比以前的最先进模型高 5.2%。此外,CorDEL 在现实世界数据集上比当前最好的 DL 模型提高了 2.4%,同时将训练参数的数量减少了 97.6%。
更新日期:2020-10-30
down
wechat
bug