当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Principled analytic classifier for positive-unlabeled learning via weighted integral probability metric
Machine Learning ( IF 4.3 ) Pub Date : 2019-11-04 , DOI: 10.1007/s10994-019-05836-9
Yongchan Kwon , Wonyoung Kim , Masashi Sugiyama , Myunghee Cho Paik

We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning). Recent studies in PU learning have shown superior performance theoretically and empirically. However, most existing algorithms may not be suitable for large-scale datasets because they face repeated computations of a large Gram matrix or require massive hyperparameter optimization. In this paper, we propose a computationally efficient and theoretically grounded PU learning algorithm. The proposed PU learning algorithm produces a closed-form classifier when the hypothesis space is a closed ball in reproducing kernel Hilbert space. In addition, we establish upper bounds of the estimation error and the excess risk. The obtained estimation error bound is sharper than existing results and the derived excess risk bound has an explicit form, which vanishes as sample sizes increase. Finally, we conduct extensive numerical experiments using both synthetic and real datasets, demonstrating improved accuracy, scalability, and robustness of the proposed algorithm.

中文翻译:

通过加权积分概率度量进行正未标记学习的原理分析分类器

我们考虑仅从正面和未标记的观察中学习二元分类器的问题(称为 PU 学习)。PU 学习的最新研究在理论上和经验上都显示出优越的性能。然而,大多数现有算法可能不适合大规模数据集,因为它们面临大型 Gram 矩阵的重复计算或需要大量超参数优化。在本文中,我们提出了一种计算效率高且具有理论依据的 PU 学习算法。当假设空间是再现核 Hilbert 空间中的封闭球时,所提出的 PU 学习算法会产生一个封闭形式的分类器。此外,我们建立了估计误差和超额风险的上限。获得的估计误差界限比现有结果更尖锐,并且导出的超额风险界限具有明确的形式,随着样本量的增加而消失。最后,我们使用合成数据集和真实数据集进行了广泛的数值实验,证明了所提出算法的准确性、可扩展性和鲁棒性的提高。
更新日期:2019-11-04
down
wechat
bug