CrowdWT,ACM Transactions on Knowledge Discovery from Data

当前位置： X-MOL 学术 › ACM Trans. Knowl. Discov. Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CrowdWT
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2020-12-07 , DOI: 10.1145/3421712
Jinzheng Tu ₁ , Guoxian Yu ₁ , Jun Wang ₁ , Carlotta Domeniconi ₂ , Maozu Guo ₃ , Xiangliang Zhang ₄

Affiliation

Crowdsourcing is a relatively inexpensive and efficient mechanism to collect annotations of data from the open Internet. Crowdsourcing workers are paid for the provided annotations, but the task requester usually has a limited budget. It is desirable to wisely assign the appropriate task to the right workers, so the overall annotation quality is maximized while the cost is reduced. In this article, we propose a novel task assignment strategy (CrowdWT) to capture the complex interactions between tasks and workers, and properly assign tasks to workers. CrowdWT first develops a Worker Bias Model (WBM) to jointly model the worker’s bias, the ground truths of tasks, and the task features. WBM constructs a mapping between task features and worker annotations to dynamically assign the task to a group of workers, who are more likely to give correct annotations for the task. CrowdWT further introduces a Task Difficulty Model (TDM), which builds a Kernel ridge regressor based on task features to quantify the intrinsic difficulty of tasks and thus to assign the difficult tasks to more reliable workers. Finally, CrowdWT combines WBM and TDM into a unified model to dynamically assign tasks to a group of workers and recall more reliable and even expert workers to annotate the difficult tasks. Our experimental results on two real-world datasets and two semi-synthetic datasets show that CrowdWT achieves high-quality answers within a limited budget, and has the best performance against competitive methods.<?vsp -1.5pt?>

中文翻译：

人群WT

众包是一种从开放互联网收集数据注释的相对便宜且有效的机制。众包工作人员因提供的注释而获得报酬，但任务请求者通常预算有限。明智地将适当的任务分配给正确的工作人员是可取的，这样可以最大限度地提高整体注释质量，同时降低成本。在本文中，我们提出了一种新颖的任务分配策略 (CrowdWT) 来捕捉任务和工作人员之间的复杂交互，并将任务正确地分配给工作人员。CrowdWT 首先开发了一个工人偏差模型（WBM）来联合建模工人的偏见、任务的基本事实和任务特征。WBM 构建任务特征和工作人员注释之间的映射，以将任务动态分配给一组工作人员，谁更有可能为任务提供正确的注释。CrowdWT 进一步引入了任务难度模型（TDM），它基于任务特征构建了一个核岭回归器来量化任务的内在难度，从而将困难的任务分配给更可靠的工作人员。最后，CrowdWT 将 WBM 和 TDM 结合成一个统一的模型，将任务动态分配给一组工人，并召回更可靠甚至专家的工人来注释困难的任务。我们在两个真实数据集和两个半合成数据集上的实验结果表明，CrowdWT 在有限的预算内获得了高质量的答案，并且在竞争方法中具有最佳性能。<?vsp -1.5pt?> 它基于任务特征构建了一个内核岭回归器，以量化任务的内在难度，从而将困难的任务分配给更可靠的工作人员。最后，CrowdWT 将 WBM 和 TDM 结合成一个统一的模型，将任务动态分配给一组工人，并召回更可靠甚至专家的工人来注释困难的任务。我们在两个真实数据集和两个半合成数据集上的实验结果表明，CrowdWT 在有限的预算内获得了高质量的答案，并且在竞争方法中具有最佳性能。<?vsp -1.5pt?> 它基于任务特征构建了一个内核岭回归器，以量化任务的内在难度，从而将困难的任务分配给更可靠的工作人员。最后，CrowdWT 将 WBM 和 TDM 结合成一个统一的模型，将任务动态分配给一组工人，并召回更可靠甚至专家的工人来注释困难的任务。我们在两个真实数据集和两个半合成数据集上的实验结果表明，CrowdWT 在有限的预算内获得了高质量的答案，并且在竞争方法中具有最佳性能。<?vsp -1.5pt?> CrowdWT 将 WBM 和 TDM 组合成一个统一的模型，将任务动态分配给一组工人，并召回更可靠甚至专家的工人来注释困难的任务。我们在两个真实数据集和两个半合成数据集上的实验结果表明，CrowdWT 在有限的预算内获得了高质量的答案，并且在竞争方法中具有最佳性能。<?vsp -1.5pt?> CrowdWT 将 WBM 和 TDM 组合成一个统一的模型，将任务动态分配给一组工人，并召回更可靠甚至专家的工人来注释困难的任务。我们在两个真实数据集和两个半合成数据集上的实验结果表明，CrowdWT 在有限的预算内获得了高质量的答案，并且在竞争方法中具有最佳性能。<?vsp -1.5pt?>

更新日期：2020-12-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11