Crowd labeling latent Dirichlet allocation.,Knowledge and Information Systems

当前位置： X-MOL 学术 › Knowl. Inf. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Crowd labeling latent Dirichlet allocation.
Knowledge and Information Systems ( IF 2.5 ) Pub Date : 2017-04-19 , DOI: 10.1007/s10115-017-1053-1
Luca Pion-Tonachini _{1,

2,

3} , Scott Makeig ₂ , Ken Kreutz-Delgado _{1,

3}

Affiliation

Large, unlabeled datasets are abundant nowadays, but getting labels for those datasets can be expensive and time-consuming. Crowd labeling is a crowdsourcing approach for gathering such labels from workers whose suggestions are not always accurate. While a variety of algorithms exist for this purpose, we present crowd labeling latent Dirichlet allocation (CL-LDA), a generalization of latent Dirichlet allocation that can solve a more general set of crowd labeling problems. We show that it performs as well as other methods and at times better on a variety of simulated and actual datasets while treating each label as compositional rather than indicating a discrete class. In addition, prior knowledge of workers’ abilities can be incorporated into the model through a structured Bayesian framework. We then apply CL-LDA to the EEG independent component labeling dataset, using its generalizations to further explore the utility of the algorithm. We discuss prospects for creating classifiers from the generated labels.

中文翻译：

人群标签潜在的Dirichlet分配。

如今，大型的，未标记的数据集非常丰富，但是获取这些数据集的标签可能既昂贵又耗时。人群标签是一种众包方法，用于从建议并不总是准确的工人那里收集此类标签。尽管为此目的存在多种算法，但我们提出了人群标记潜在狄利克雷分配（CL-LDA），这是潜在狄利克雷分配的一般化，可以解决更一般的人群标记问题。我们展示了它在其他模拟和实际数据集上的性能和其他方法一样好，有时表现更好，同时将每个标签视为组成标签，而不是指示离散类。此外，可以通过结构化的贝叶斯框架将有关工人能力的先验知识整合到模型中。然后，我们将CL-LDA应用于独立于脑电图的成分标签数据集，并使用其概括来进一步探索该算法的实用性。我们讨论了从生成的标签创建分类器的前景。

更新日期：2017-04-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11