当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multi-label crowd consensus via joint matrix factorization
Knowledge and Information Systems ( IF 2.5 ) Pub Date : 2019-07-25 , DOI: 10.1007/s10115-019-01386-7
Jinzheng Tu , Guoxian Yu , Carlotta Domeniconi , Jun Wang , Guoqiang Xiao , Maozu Guo

Crowdsourcing is a useful and economic approach to annotate data. Various computational solutions have been developed to pursue a consensus of high quality. However, available solutions mainly target single-label tasks, and they neglect correlations among labels. In this paper, we introduce a multi-label crowd consensus (MLCC) model based on a joint matrix factorization. Specifically, MLCC selectively and jointly factorizes the sample-label association matrices into products of individual and shared low-rank matrices. As such, it makes use of the robustness of low-rank matrix approximation to noisy annotations and diminishes the impact of unreliable annotators by assigning small weights to their annotation matrices. To obtain coherent low-rank matrices, MLCC additionally leverages the shared low-rank matrix to model correlations among labels, and the individual low-rank matrices to measure the similarity between annotators. MLCC then computes the low-rank matrices and weights via a unified objective function, and adopts an alternative optimization technique to iteratively optimize them. Finally, MLCC uses the optimized low-rank matrices and weights to compute the consensus labels. Our experimental results demonstrate that MLCC outperforms competitive methods in inferring consensus labels. Besides identifying spammers, MLCC achieves robustness against their incorrect annotations, by crediting them small, or zero, weights.

中文翻译:

通过联合矩阵分解进行多标签人群共识

众包是一种注释数据的有用且经济的方法。为了追求高质量的共识,已经开发了各种计算解决方案。但是,可用的解决方案主要针对单标签任务,而忽略了标签之间的相关性。在本文中,我们介绍了基于联合矩阵分解的多标签人群共识(MLCC)模型。具体而言,MLCC将样本标签关联矩阵选择性地和联合地分解为单个和共享的低秩矩阵的乘积。这样,它利用低秩矩阵逼近的鲁棒性来进行嘈杂的注释,并通过为它们的注释矩阵分配较小的权重来减少不可靠注释器的影响。为了获得相干的低秩矩阵,MLCC还利用共享的低秩矩阵对标签之间的相关性进行建模,以及各个低秩矩阵来度量注释器之间的相似性。然后,MLCC通过统一的目标函数计算低阶矩阵和权重,并采用替代性优化技术对它们进行迭代优化。最后,MLCC使用优化的低秩矩阵和权重来计算共识标签。我们的实验结果表明,在推断共识标签方面,MLCC优于竞争方法。除了识别垃圾邮件发送者之外,MLCC还通过将其归为较小或零的权重来实现对不正确注释的鲁棒性。MLCC使用优化的低秩矩阵和权重来计算共识标签。我们的实验结果表明,在推断共识标签方面,MLCC优于竞争方法。除了识别垃圾邮件发送者外,MLCC还通过将其归为较小或零的权重来实现对不正确注释的鲁棒性。MLCC使用优化的低秩矩阵和权重来计算共识标签。我们的实验结果表明,在推断共识标签方面,MLCC优于竞争方法。除了识别垃圾邮件发送者外,MLCC还通过将其归为较小或零的权重来实现对不正确注释的鲁棒性。
更新日期:2019-07-25
down
wechat
bug