当前位置: X-MOL 学术Comput. Ind. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Manifold cluster-based evolutionary ensemble imbalance learning
Computers & Industrial Engineering ( IF 6.7 ) Pub Date : 2021-07-03 , DOI: 10.1016/j.cie.2021.107523
Yinan Guo 1, 2 , Jiawei Feng 2 , Botao Jiao 2 , Linkai Yang 2 , Hui Lu 3 , Zekuan Yu 4
Affiliation  

For an imbalanced dataset, traditional machine learning methods usually misclassify minority samples due to the indicator evaluating classification accuracy biased toward majority class. To address the issue, manifold cluster-based evolutionary ensemble imbalance learning is proposed, with the purpose of providing a more effective framework for building an optimal imbalance classifier. After mapping the original data to manifold space, majority samples are removed from each sub-cluster in terms of their distribution characteristic. Following that, a new one is generated in each minority sub-cluster by over-sampling, with the purpose of avoiding a misclassified new minority sample that produced from small disjuncts. In above manifold clustering-based resampling techniques, optional operations and key parameters for normalization, manifold learning, clustering, under-sampling and over-sampling form various combination. Thus, evolutionary algorithm is introduced to seek the optimal structure for MECS-Ensemble. Each individual is encoded by five integer and six real number, and a fitness function is designed to evaluate its classification accuracy and the diversity of majority samples. The statistical experimental results for 39 imbalanced datasets show that MECS-Ensemble proposed in the paper is superior to the other imbalance learning methods, especially, manifold clustering-based resampling technique contributes to significant performance improvements.



中文翻译:

基于流形聚类的进化集成不平衡学习

对于不平衡的数据集,由于评估分类精度的指标偏向于多数类,传统的机器学习方法通​​常会错误地分类少数样本。为了解决这个问题,提出了基于流形聚类的进化集成不平衡学习,目的是为构建最优不平衡分类器提供更有效的框架。将原始数据映射到流形空间后,根据分布特征从每个子集群中去除多数样本。之后,通过过采样在每个少数族群中生成一个新的少数族群,目的是避免由小分离产生的错误分类的新少数族群样本。在上述基于流形聚类的重采样技术中,用于归一化、流形学习的可选操作和关键参数,聚类、欠采样和过采样形成各种组合。因此,引入进化算法来寻找 MECS-Ensemble 的最佳结构。每个个体由五个整数和六个实数编码,并设计了一个适应度函数来评估其分类精度和多数样本的多样性。对 39 个不平衡数据集的统计实验结果表明,本文提出的 MECS-Ensemble 优于其他不平衡学习方法,尤其是基于流形聚类的重采样技术有助于显着提高性能。并且设计了一个适应度函数来评估其分类精度和多数样本的多样性。对 39 个不平衡数据集的统计实验结果表明,本文提出的 MECS-Ensemble 优于其他不平衡学习方法,尤其是基于流形聚类的重采样技术有助于显着提高性能。并且设计了一个适应度函数来评估其分类精度和多数样本的多样性。对 39 个不平衡数据集的统计实验结果表明,本文提出的 MECS-Ensemble 优于其他不平衡学习方法,尤其是基于流形聚类的重采样技术有助于显着提高性能。

更新日期:2021-07-12
down
wechat
bug