当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving crowd labeling using Stackelberg models
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2021-01-26 , DOI: 10.1007/s13042-021-01276-x
Wenjun Yang , Chaoqun Li

Crowdsourcing systems provide an easy means of acquiring labeled training data for supervised learning. However, the labels provided by non-expert crowd workers (labelers) often appear low quality. In order to solve this problem, in practice each sample always obtains a multiple noisy label set from multiple different labelers, then ground truth inference algorithms are employed to obtain integrated labels of samples. So ground truth inference methods directly determine the label quality of samples. In this paper, we propose a novel label integration method based on game theory. We assume that there is an adversary in crowdsourcing system who intentionally provides incorrect integrated labels. We model the interaction between the data miner and the adversary as a Stackelberg game in which one player (the data miner) controls the predictive model whereas another (the adversary) tries to choose the integrated labels which would be most harmful for the current classifier. On this basis, we transform the label integration problem into a repeated Stackelberg model. We call our method Stackelberg label inference (SLI). SLI does not need to estimate the quality of labelers, and avoids the chicken-egg problem that can lead to poor result. Moreover, because SLI has little involvement of multiple noisy label sets on the noise data set, it is not very sensitive to the number of labelers. SLI shows better performance when the number of labelers is relatively small. In term of both label quality and model quality, the experimental results show that SLI is superior to the other state-of-the-art ground truth inference methods used to compare.



中文翻译:

使用Stackelberg模型改善人群标签

众包系统提供了一种获取带标签的培训数据以进行监督学习的简便方法。但是,由非专业人群工作者(贴标者)提供的标签通常看起来质量低下。为了解决这个问题,在实践中,每个样本总是从多个不同的标记器中获取多个有噪声的标记集,然后采用地面真值推断算法来获取样本的集成标记。因此,地面真理推论方法直接决定了样品的标签质量。本文提出了一种基于博弈论的标签集成新方法。我们假设众包系统中有一个对手故意提供不正确的集成标签。我们将数据挖掘者与对手之间的交互建模为Stackelberg游戏,其中一个参与者(数据挖掘者)控制了预测模型,而另一个玩家(对手)则试图选择对当前分类器最有害的集成标签。在此基础上,我们将标签集成问题转换为重复的Stackelberg模型。我们称此方法为Stackelberg标签推断(SLI)。SLI不需要估计贴标机的质量,并且避免了可能导致不良结果的鸡肉问题。此外,由于SLI很少涉及噪声数据集上的多个带噪标签集,因此它对标签数量并不十分敏感。当标记器的数量相对较少时,SLI表现出更好的性能。在标签质量和模型质量方面,

更新日期:2021-01-28
down
wechat
bug