Reducing Uncertainty of Schema Matching via Crowdsourcing with Accuracy Rates,IEEE Transactions on Knowledge and Data Engineering

当前位置： X-MOL 学术 › IEEE Trans. Knowl. Data. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Reducing Uncertainty of Schema Matching via Crowdsourcing with Accuracy Rates
IEEE Transactions on Knowledge and Data Engineering ( IF 8.9 ) Pub Date : 2020-01-01 , DOI: 10.1109/tkde.2018.2881185
Chen Jason Zhang , Lei Chen , H. V. Jagadish , Mengchen Zhang , Yongxin Tong

Schema matching is a central challenge for data integration systems. Inspired by the popularity and the success of crowdsourcing platforms, we explore the use of crowdsourcing to reduce the uncertainty of schema matching. Since crowdsourcing platforms are most effective for simple questions, we assume that each Correspondence Correctness Question (CCQ) asks the crowd to decide whether a given correspondence should exist in the correct matching. Furthermore, members of a crowd may sometimes return incorrect answers with different probabilities. Accuracy rates of individual crowd workers can be attributes of CCQs as well as evaluations of individual workers. We prove that uncertainty reduction equals to entropy of answers minus entropy of crowds and show how to obtain lower and upper bounds for it. We propose frameworks and efficient algorithms to dynamically manage the CCQs to maximize the uncertainty reduction within a limited budget of questions. We develop two novel approaches, namely “Single CCQ” and “Multiple CCQ”, which adaptively select, publish, and manage questions. We verify the value of our solutions with simulation and real implementation.

中文翻译：

通过众包准确率降低模式匹配的不确定性

模式匹配是数据集成系统的核心挑战。受到众包平台的流行和成功的启发，我们探索使用众包来减少模式匹配的不确定性。由于众包平台对于简单的问题最有效，我们假设每个对应正确性问题（CCQ）都要求人群决定给定的对应是否应该存在于正确的匹配中。此外，人群中的成员有时可能会以不同的概率返回不正确的答案。个体人群工作者的准确率可以是 CCQ 的属性以及个体工作者的评价。我们证明不确定性减少等于答案的熵减去人群的熵，并展示如何获得它的下限和上限。我们提出了框架和有效算法来动态管理 CCQ，以在有限的问题预算内最大限度地减少不确定性。我们开发了两种新颖的方法，即“Single CCQ”和“Multiple CCQ”，它们自适应地选择、发布和管理问题。我们通过模拟和实际实施来验证我们解决方案的价值。

更新日期：2020-01-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11