当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A flexible procedure for mixture proportion estimation in positive‐unlabeled learning
Statistical Analysis and Data Mining ( IF 2.1 ) Pub Date : 2020-01-10 , DOI: 10.1002/sam.11447
Zhenfeng Lin 1 , James P. Long 2
Affiliation  

Positive‐unlabeled (PU) learning considers two samples, a positive set P with observations from only one class and an unlabeled set U with observations from two classes. The goal is to classify observations in U. Class mixture proportion estimation (MPE) in U is a key step in PU learning. Blanchard et al. showed that MPE in PU learning is a generalization of the problem of estimating the proportion of true null hypotheses in multiple testing problems. Motivated by this idea, we propose reducing the problem to one‐dimension via construction of a probabilistic classifier trained on the P and U data sets followed by application of a one‐dimensional mixture proportion method from the multiple testing literature to the observation class probabilities. The flexibility of this framework lies in the freedom to choose the classifier and the one‐dimensional MPE method. We prove consistency of two mixture proportion estimators using bounds from empirical process theory, develop tuning parameter free implementations, and demonstrate that they have competitive performance on simulated waveform data and a protein signaling problem.

中文翻译:

积极的无标签学习中混合比例估计的灵活程序

阳性未标记(PU)学习考虑两个样本,一个阳性集合P仅具有一个类别的观察结果,而一个未标记集合U具有两个类别的观察结果。目的是对U中的观测值进行分类。U中的类混合比例估计(MPE)是PU学习中的关键步骤。Blanchard等。研究表明,MPE在PU学习中是对估计真实零假设在多个测试问题中所占比例的问题的概括。受此想法的启发,我们建议通过构造在PU上训练的概率分类器,将问题减少到一维。数据集,然后应用一维混合比例方法,从多种测试文献到观察类别概率。该框架的灵活性在于选择分类器和一维MPE方法的自由。我们使用经验过程理论的界限证明了两个混合比例估计量的一致性,开发了无调节参数的实现,并证明了它们在模拟波形数据和蛋白质信号问题上具有竞争优势。
更新日期:2020-01-10
down
wechat
bug