当前位置: X-MOL 学术Stat › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The effect of rebalancing on LDA in imbalanced classification
Stat ( IF 0.7 ) Pub Date : 2021-04-27 , DOI: 10.1002/sta4.384
Arlene K. H. Kim 1 , Hyunwoo Chung 1
Affiliation  

In binary classification, class imbalance is undesirable in that it may worsen the performance of a classifier. One of the remedies to handle this problem is rebalancing with the optimal rate. Theoretical derivation of the rate is not usually considered and often empirically detected, because it is complex and depends on the classifier. To simplify this, we used a linear discriminant classifier, deriving the theoretical optimal rate that maximizes the Matthews Correlation Coefficient (MCC) and F1 score assuming normality. We showed that adjusting the size of each class to be equal is not always the best solution. Instead, we found that there exists the optimal rate depending on the level of class imbalance and the Mahalanobis distance between two classes. Conducting extensive simulation studies and real data analyses, we confirmed that rebalancing with the optimal rate improves the test MCC and F1 score. These findings suggest that with a careful consideration on the level of class imbalance and the separability between two classes, we can achieve better classification results in presence of class imbalance.

中文翻译:

不平衡分类中重新平衡对LDA的影响

在二元分类中,类不平衡是不可取的,因为它可能会降低分类器的性能。处理此问题的一种补救措施是使用最佳速率重新平衡。通常不考虑比率的理论推导,并且经常凭经验检测,因为它很复杂并且取决于分类器。为了简化这一点,我们使用了一个线性判别分类器,推导出最大化马修斯相关系数 (MCC) 和F 1的理论最优速率得分假设为正态。我们表明,将每个类的大小调整为相等并不总是最好的解决方案。相反,我们发现存在取决于类别不平衡程度和两个类别之间的马氏距离的最佳比率。通过进行广泛的模拟研究和真实数据分析,我们确认以最佳速率重新平衡可提高测试 MCC 和F 1分数。这些发现表明,通过仔细考虑类不平衡的程度和两个类之间的可分离性,我们可以在存在类不平衡的情况下获得更好的分类结果。
更新日期:2021-04-27
down
wechat
bug