当前位置: X-MOL 学术Adv. Data Anal. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Estimating the class prior for positive and unlabelled data via logistic regression
Advances in Data Analysis and Classification ( IF 1.6 ) Pub Date : 2021-06-03 , DOI: 10.1007/s11634-021-00444-9
Małgorzata Łazęcka , Jan Mielniczuk , Paweł Teisseyre

In the paper, we revisit the problem of class prior probability estimation with positive and unlabelled data gathered in a single-sample scenario. The task is important as it is known that in positive unlabelled setting, a classifier can be successfully learned if the class prior is available. We show that without additional assumptions, class prior probability is not identifiable and thus the existing non-parametric estimators are necessarily biased in general if extra assumptions are not imposed. The magnitude of their bias is also investigated. The problem becomes identifiable when the probabilistic structure satisfies mild semi-parametric assumptions. Consequently, we propose a method based on a logistic fit and a concave minorization of its (non-concave) log-likelihood. The experiments conducted on artificial and benchmark datasets as well as on a large clinical database MIMIC indicate that the estimation errors for the proposed method are usually lower than for its competitors and that it is robust against departures from logistic settings.



中文翻译:

通过逻辑回归估计正面和未标记数据的类先验

在本文中,我们重新审视了在单样本场景中收集的正数据和未标记数据的类先验概率估计问题。该任务很重要,因为众所周知,在正面未标记的设置中,如果类先验可用,则可以成功学习分类器。我们表明,如果没有额外的假设,类先验概率是不可识别的,因此如果不强加额外的假设,现有的非参数估计量通常必然有偏差。他们的偏见的大小也被调查。当概率结构满足温和的半参数假设时,问题变得可识别。因此,我们提出了一种基于逻辑拟合和其(非凹)对数似然的凹微化的方法。

更新日期:2021-06-04
down
wechat
bug