Neighborhood-based Pooling for Population-level Label Distribution Learning,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Neighborhood-based Pooling for Population-level Label Distribution Learning
arXiv - CS - Machine Learning Pub Date : 2020-03-16 , DOI: arxiv-2003.07406
Tharindu Cyril Weerasooriya, Tong Liu, Christopher M. Homan

Supervised machine learning often requires human-annotated data. While annotator disagreement is typically interpreted as evidence of noise, population-level label distribution learning (PLDL) treats the collection of annotations for each data item as a sample of the opinions of a population of human annotators, among whom disagreement may be proper and expected, even with no noise present. From this perspective, a typical training set may contain a large number of very small-sized samples, one for each data item, none of which, by itself, is large enough to be considered representative of the underlying population's beliefs about that item. We propose an algorithmic framework and new statistical tests for PLDL that account for sampling size. We apply them to previously proposed methods for sharing labels across similar data items. We also propose new approaches for label sharing, which we call neighborhood-based pooling.

中文翻译：

用于人口级标签分布学习的基于邻域的池化

有监督的机器学习通常需要人工标注的数据。虽然注释者的分歧通常被解释为噪音的证据，但群体级标签分布学习 (PLDL) 将每个数据项的注释集合视为人类注释者群体意见的样本，其中的分歧可能是正确的和预期的，即使没有噪音。从这个角度来看，典型的训练集可能包含大量非常小的样本，每个数据项一个，这些样本本身都不足以被视为代表潜在人群对该项目的信念。我们为 PLDL 提出了一个算法框架和新的统计测试，以考虑抽样规模。我们将它们应用于先前提出的跨相似数据项共享标签的方法。

更新日期：2020-05-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>