Improving data and model quality in crowdsourcing using cross-entropy-based noise correction,Information Sciences

当前位置： X-MOL 学术 › Inform. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Improving data and model quality in crowdsourcing using cross-entropy-based noise correction
Information Sciences Pub Date : 2020-09-08 , DOI: 10.1016/j.ins.2020.08.117
Wenqiang Xu , Liangxiao Jiang , Chaoqun Li

Crowdsourcing services provide a fast, efficient, and cost-effective approach to obtaining labeled data, particularly for human-like tasks. In a crowdsourcing scenario, after ground truth inference methods have been employed to obtain integrated instance labels, label noise remains present in the integrated labels. Label noise handling techniques can then be implemented to mitigate the effects of this noise. In this study, we propose a Cross-Entropy-based Noise Correction (CENC) method for crowdsourcing. CENC uses the entropies of the label distributions generated from multiple noisy label sets to filter noisy instances. It then exploits the cross-entropies between each possible true class probability distribution and each predicted class probability distribution to rectify the noisy instances. Using both simulated benchmark data and real-world crowdsourced data, we show that CENC outperforms all other existing state-of-the-art noise correction methods.

中文翻译：

使用基于交叉熵的噪声校正来改善众包中的数据和模型质量

众包服务提供了一种快速，高效且具有成本效益的方法来获取标记的数据，尤其是对于类似人类的任务。在众包场景中，在采用地面事实推论方法获取集成实例标签之后，标签噪声仍然存在于集成标签中。然后可以实施标签噪声处理技术来减轻这种噪声的影响。在这项研究中，我们提出了一种基于交叉熵的噪声校正（CENC）方法进行众包。CENC使用从多个噪声标签集生成的标签分布的熵来过滤噪声实例。然后，它利用每个可能的真实分类概率分布与每个预测的分类概率分布之间的交叉熵来纠正嘈杂的实例。

更新日期：2020-09-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11