当前位置: X-MOL 学术Appl. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Simulated annealing based undersampling (SAUS): a hybrid multi-objective optimization method to tackle class imbalance
Applied Intelligence ( IF 5.3 ) Pub Date : 2021-06-04 , DOI: 10.1007/s10489-021-02369-4
Venkata Krishnaveni Chennuru , Sobha Rani Timmappareddy

Learning from imbalanced datasets is a challenging problem in machine learning research since the performance of the traditional classifiers suffer from biased classification towards the Majority class resulting in a low Minority class prediction rate. The inherent assumptions of equal class distribution and accuracy-driven evaluation are the identified reasons behind this degraded performance. Further, false negatives have higher penalty than the false positives. A simple logical solution to mitigate this issue is to construct a balanced training set from the imbalanced one. However, several such sets of balanced training sets can be formed for a given imbalanced set from which an optimal balanced training set has to be obtained. This is a computationally intractable problem and prone to local-optimal maxima/minima. To address these issues, a Simulated Annealing-based Under Sampling (SAUS) method is proposed. Simulated annealing is a popular meta-heuristic search algorithm, which implements a novel cost function in terms of Balanced Error Rate. This cost function strikes a balance between Sensitivity and Specificity measures while evaluating the solution at each iteration in the subsampling process and also is free from the local trap. The experimental results of SAUS demonstrate that the average Sensitivity measure on the test set has improved from 0.68 to 0.86 and proves its efficacy in tackling the imbalance issue in the dataset. Area Under the ROC Curve (AUC) results also demonstrate that SAUS outperforms several popular undersampling methods. SAUS works on par with state-of-the-art solutions for the class imbalance problem.



中文翻译:

基于模拟退火的欠采样(SAUS):一种解决类不平衡的混合多目标优化方法

从不平衡的数据集中学习是机器学习研究中的一个具有挑战性的问题,因为传统分类器的性能受到偏向于多数类的分类,导致少数类预测率低。等类分布和精度驱动评估的固有假设是这种性能下降背后的已确定原因。此外,假阴性比假阳性具有更高的惩罚。缓解这个问题的一个简单的逻辑解决方案是从不平衡的训练集构建一个平衡的训练集。然而,对于必须从中获得最佳平衡训练集的给定不平衡集,可以形成多个这样的平衡训练集集。这是一个计算上难以处理的问题,并且容易出现局部最优最大值/最小值。为了解决这些问题,提出了一种基于模拟退火的欠采样(SAUS)方法。模拟退火是一种流行的元启发式搜索算法,它在平衡错误率方面实现了一种新颖的成本函数。该成本函数在子采样过程中的每次迭代中评估解决方案时,在敏感性和特异性度量之间取得平衡,并且不受局部陷阱的影响。SAUS 的实验结果表明,测试集上的平均 Sensitivity 度量从 0.68 提高到 0.86,并证明了其在解决数据集中不平衡问题方面的有效性。ROC 曲线下面积 (AUC) 结果也表明 SAUS 优于几种流行的欠采样方法。SAUS 与最先进的解决类不平衡问题的解决方案相当。

更新日期:2021-06-04
down
wechat
bug