当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Relabeling Approach to Handling the Class Imbalance Problem for Logistic Regression
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2021-11-12 , DOI: 10.1080/10618600.2021.1978470
Yazhe Li 1 , Niall Adams 1 , Tony Bellotti 1
Affiliation  

Abstract

Logistic regression is a standard procedure for real-world classification problems. The challenge of class imbalance arises in two-class classification problems when the minority class is observed much less than the majority class. This characteristic is endemic in many domains. Work by Owen has shown that cluster structure among the minority class may be a specific problem in highly imbalanced logistic regression. In this article, we propose a novel relabeling approach to handle the class imbalance problem when using logistic regression, which essentially assigns new labels to the minority class observations. An expectation–maximization algorithm is formalized to serve as a tool for efficiently computing this relabeling. Modeling on such relabeled data can lead to improved predictive performance. We demonstrate the effectiveness of this approach with detailed experiments on real datasets. Supplemental materials for the article are available online.



中文翻译:

一种处理逻辑回归的类不平衡问题的重新标记方法

摘要

逻辑回归是现实世界分类问题的标准程序。当观察到的少数类比多数类少得多时,类不平衡的挑战出现在两类分类问题中。这种特性在许多领域都很普遍。Owen 的工作表明,少数类别中的集群结构可能是高度不平衡的逻辑回归中的一个特定问题。在本文中,我们提出了一种新的重新标记方法来处理使用逻辑回归时的类不平衡问题,该方法实质上为少数类观察分配了新标签。期望最大化算法被形式化以用作有效计算这种重新标记的工具。对此类重新标记的数据进行建模可以提高预测性能。我们通过对真实数据集的详细实验证明了这种方法的有效性。文章的补充材料可在线获取。

更新日期:2021-11-12
down
wechat
bug