PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching,arXiv - CS - Human-Computer Interaction

当前位置： X-MOL 学术 › arXiv.cs.HC › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PoWareMatch: a Quality-aware Deep Learning Approach to Improve Human Schema Matching
arXiv - CS - Human-Computer Interaction Pub Date : 2021-09-15 , DOI: arxiv-2109.07321
Roee Shraga, Avigdor Gal

Schema matching is a core task of any data integration process. Being investigated in the fields of databases, AI, Semantic Web and data mining for many years, the main challenge remains the ability to generate quality matches among data concepts (e.g., database attributes). In this work, we examine a novel angle on the behavior of humans as matchers, studying match creation as a process. We analyze the dynamics of common evaluation measures (precision, recall, and f-measure), with respect to this angle and highlight the need for unbiased matching to support this analysis. Unbiased matching, a newly defined concept that describes the common assumption that human decisions represent reliable assessments of schemata correspondences, is, however, not an inherent property of human matchers. In what follows, we design PoWareMatch that makes use of a deep learning mechanism to calibrate and filter human matching decisions adhering the quality of a match, which are then combined with algorithmic matching to generate better match results. We provide an empirical evidence, established based on an experiment with more than 200 human matchers over common benchmarks, that PoWareMatch predicts well the benefit of extending the match with an additional correspondence and generates high quality matches. In addition, PoWareMatch outperforms state-of-the-art matching algorithms.

中文翻译：

PoWareMatch：一种改善人体模式匹配的质量感知深度学习方法

模式匹配是任何数据集成过程的核心任务。在数据库、人工智能、语义网和数据挖掘领域研究多年，主要挑战仍然是在数据概念（例如，数据库属性）之间生成质量匹配的能力。在这项工作中，我们从一个新的角度研究人类作为匹配者的行为，将匹配创建作为一个过程进行研究。我们从这个角度分析了常见评估指标（精度、召回率和 f 指标）的动态，并强调了无偏匹配的必要性来支持这种分析。无偏匹配是一个新定义的概念，它描述了人类决策代表图式对应的可靠评估的共同假设，然而，它不是人类匹配器的固有属性。在接下来的内容中，我们设计了 PoWareMatch，它利用深度学习机制来校准和过滤符合匹配质量的人类匹配决策，然后将其与算法匹配相结合以生成更好的匹配结果。我们提供了一个经验证据，该证据基于对 200 多个人类匹配器在通用基准上的实验，即 PoWareMatch 很好地预测了通过额外的对应关系扩展匹配并生成高质量匹配的好处。此外，PoWareMatch 优于最先进的匹配算法。PoWareMatch 基于对 200 多个人类匹配器在通用基准上进行的实验而建立，它很好地预测了通过额外的对应关系扩展匹配并生成高质量匹配的好处。此外，PoWareMatch 优于最先进的匹配算法。PoWareMatch 基于对 200 多个人类匹配器在通用基准上进行的实验而建立，它很好地预测了通过额外的对应关系扩展匹配并生成高质量匹配的好处。此外，PoWareMatch 优于最先进的匹配算法。

更新日期：2021-09-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>