当前位置: X-MOL 学术J. Exp. Theor. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Resampling-based noise correction for crowdsourcing
Journal of Experimental & Theoretical Artificial Intelligence ( IF 2.2 ) Pub Date : 2020-08-17
Wenqiang Xu, Liangxiao Jiang, Chaoqun Li

Crowdsourcing services provide an economic and efficient means of acquiring multiple noisy labels for each training instance in supervised learning. Ground truth inference methods, also known as consensus methods, are then used to obtain the integrated labels of training instances. Although consensus methods are effective, there still exists a level of noise in the set of integrated labels. Therefore, it is necessary to handle noise in the integrated labels to improve label and model quality. In this paper, we propose a resampling-based noise correction method (simply RNC). Different from previous label noise correction methods for crowdsourcing, RNC first employs a filter to obtain a clean set and a noisy set and then repeatedly resamples the clean and noisy sets several times according to a certain proportion. Finally, multiple classifiers built on the resampled data sets are used to re-label the training data. Experimental results based on 18 simulated data sets and five real-world data sets demonstrate that RNC rarely degrades the label and model quality compared to other three state-of-the-art noise correction methods and, in many cases, improves quality dramatically.



中文翻译:

基于重采样的噪声校正,用于众包

众包服务为监督学习中的每个培训实例提供了一种获取多个嘈杂标签的经济有效方式。然后,使用地面真理推论方法(也称为共识方法)来获取训练实例的集成标签。尽管共识方法是有效的,但在集成标签集中仍然存在一定程度的噪音。因此,有必要处理集成标签中的噪声以改善标签和模型的质量。在本文中,我们提出了一种基于重采样的噪声校正方法(简称RNC)。与以前的用于众包的标签噪声校正方法不同,RNC首先使用滤波器来获得干净集和有噪集,然后根据一定比例重复几次对干净集和有噪集进行重新采样。最后,在重新采样的数据集上建立的多个分类器用于重新标记训练数据。基于18个模拟数据集和5个实际数据集的实验结果表明,与其他三种最新的噪声校正方法相比,RNC几乎不会降低标签和模型的质量,并且在许多情况下,可以大大提高质量。

更新日期:2020-08-17
down
wechat
bug