A Unified View of Label Shift Estimation,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Unified View of Label Shift Estimation
arXiv - CS - Machine Learning Pub Date : 2020-03-17 , DOI: arxiv-2003.07554
Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, Zachary C. Lipton

Under label shift, the label distribution p(y) might change but the class-conditional distributions p(x|y) do not. There are two dominant approaches for estimating the label marginal. BBSE, a moment-matching approach based on confusion matrices, is provably consistent and provides interpretable error bounds. However, a maximum likelihood estimation approach, which we call MLLS, dominates empirically. In this paper, we present a unified view of the two methods and the first theoretical characterization of MLLS. Our contributions include (i) consistency conditions for MLLS, which include calibration of the classifier and a confusion matrix invertibility condition that BBSE also requires; (ii) a unified framework, casting BBSE as roughly equivalent to MLLS for a particular choice of calibration method; and (iii) a decomposition of MLLS's finite-sample error into terms reflecting miscalibration and estimation error. Our analysis attributes BBSE's statistical inefficiency to a loss of information due to coarse calibration. Experiments on synthetic data, MNIST, and CIFAR10 support our findings.

中文翻译：

标签偏移估计的统一视图

在标签移位下，标签分布 p(y) 可能会改变，但类条件分布 p(x|y) 不会。估计标签边际有两种主要方法。BBSE 是一种基于混淆矩阵的矩匹配方法，可证明是一致的，并提供可解释的误差界限。然而，我们称之为 MLLS 的最大似然估计方法在经验上占主导地位。在本文中，我们提出了两种方法的统一观点以及 MLLS 的第一个理论表征。我们的贡献包括 (i) MLLS 的一致性条件，其中包括分类器的校准和 BBSE 也需要的混淆矩阵可逆条件；(ii) 一个统一的框架，对于特定的校准方法选择，BBSE 大致相当于 MLLS；(iii) MLLS' 的分解 s 将有限样本误差转化为反映误校准和估计误差的术语。我们的分析将 BBSE 的统计效率低下归因于粗校准导致的信息丢失。对合成数据、MNIST 和 CIFAR10 的实验支持我们的发现。

更新日期：2020-10-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>