当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Unified View of Label Shift Estimation
arXiv - CS - Machine Learning Pub Date : 2020-03-17 , DOI: arxiv-2003.07554
Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, Zachary C. Lipton

Under label shift, the label distribution p(y) might change but the class-conditional distributions p(x|y) do not. There are two dominant approaches for estimating the label marginal. BBSE, a moment-matching approach based on confusion matrices, is provably consistent and provides interpretable error bounds. However, a maximum likelihood estimation approach, which we call MLLS, dominates empirically. In this paper, we present a unified view of the two methods and the first theoretical characterization of MLLS. Our contributions include (i) consistency conditions for MLLS, which include calibration of the classifier and a confusion matrix invertibility condition that BBSE also requires; (ii) a unified framework, casting BBSE as roughly equivalent to MLLS for a particular choice of calibration method; and (iii) a decomposition of MLLS's finite-sample error into terms reflecting miscalibration and estimation error. Our analysis attributes BBSE's statistical inefficiency to a loss of information due to coarse calibration. Experiments on synthetic data, MNIST, and CIFAR10 support our findings.

中文翻译:

标签偏移估计的统一视图

在标签移位下,标签分布 p(y) 可能会改变,但类条件分布 p(x|y) 不会。估计标签边际有两种主要方法。BBSE 是一种基于混淆矩阵的矩匹配方法,可证明是一致的,并提供可解释的误差界限。然而,我们称之为 MLLS 的最大似然估计方法在经验上占主导地位。在本文中,我们提出了两种方法的统一观点以及 MLLS 的第一个理论表征。我们的贡献包括 (i) MLLS 的一致性条件,其中包括分类器的校准和 BBSE 也需要的混淆矩阵可逆条件;(ii) 一个统一的框架,对于特定的校准方法选择,BBSE 大致相当于 MLLS;(iii) MLLS' 的分解 s 将有限样本误差转化为反映误校准和估计误差的术语。我们的分析将 BBSE 的统计效率低下归因于粗校准导致的信息丢失。对合成数据、MNIST 和 CIFAR10 的实验支持我们的发现。
更新日期:2020-10-20
down
wechat
bug