当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel dynamic ensemble selection classifier for an imbalanced data set: An application for credit risk assessment
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-09-16 , DOI: 10.1016/j.knosys.2020.106462
Wen-hui Hou , Xiao-kang Wang , Hong-yu Zhang , Jian-qiang Wang , Lin Li

Credit risk assessment is usually regarded as an imbalanced classification task solved by static ensemble classifiers. However, the dynamic ensemble selection (DES) strategy that can select different ensemble classifiers for each query sample is rarely used. Deficiency of the existing DES algorithm in dealing with imbalanced data is the major challenge. In this paper, a novel combined DES model is developed for imbalanced learning problems. To handle the imbalanced data sets, the synthetic minority over-sampling technique is initially used to balance a training set before generating a candidate classifier pool; then, the weighting mechanism of DES-MI (multi-class imbalance) is used to highlight the importance of minority instances when evaluating classifier competences. To further ensure the comprehensive evaluation and right selection of the ensemble classifier, the meta-learning framework of META-DES is used to account for multiple criteria, and the two-step selection strategy of DES-KNN (k-nearest neighbours) is employed to perform a trade-off between the competence and diversity of the classifiers. Our experiments on 15 imbalanced data sets from the KEEL repository show that the proposed model improves the performance of seven known and popular DES algorithms in terms of the area under the curve. Moreover, the type I error rate of the proposed method is lower than that of XGBoost and LightGBM in a real P2P loan data set indicating the efficiency of the proposed method for credit risk assessment.



中文翻译:

一种用于不平衡数据集的新型动态集成选择分类器:信用风险评估的应用

信用风险评估通常被认为是由静态集成分类器解决的不平衡分类任务。但是,很少使用可以为每个查询样本选择不同集合分类器的动态集合选择(DES)策略。现有的DES算法在处理不平衡数据方面的不足是主要挑战。在本文中,针对不平衡学习问题开发了一种新颖的组合DES模型。为了处理不平衡的数据集,在生成候选分类器池之前,首先使用合成少数采样算法来平衡训练集。然后,在评估分类器能力时,使用DES-MI(多类别不平衡)的加权机制来突出少数群体实例的重要性。为了进一步确保集成分类器的综合评估和正确选择,使用了META-DES的元学习框架来考虑多个标准,并采用了DES-KNN(k最近邻)的两步选择策略。在分类器的能力和多样性之间进行权衡。我们从KEEL存储库中对15个不平衡数据集进行的实验表明,提出的模型在曲线下面积方面提高了7种已知和流行的DES算法的性能。此外,在真实的P2P贷款数据集中,该方法的I型错误率低于XGBoost和LightGBM,表明该方法在信用风险评估中的效率。并采用DES-KNN(k最近邻)的两步选择策略在分类器的能力和多样性之间进行权衡。我们从KEEL存储库中对15个不平衡数据集进行的实验表明,提出的模型在曲线下面积方面提高了7种已知和流行的DES算法的性能。此外,在真实的P2P贷款数据集中,该方法的I型错误率低于XGBoost和LightGBM,表明该方法在信用风险评估中的效率。并采用DES-KNN(k最近邻)的两步选择策略在分类器的能力和多样性之间进行权衡。我们从KEEL存储库中对15个不平衡数据集进行的实验表明,提出的模型在曲线下面积方面提高了7种已知和流行的DES算法的性能。此外,在真实的P2P贷款数据集中,该方法的I型错误率低于XGBoost和LightGBM,表明该方法在信用风险评估中的效率。我们从KEEL存储库中对15个不平衡数据集进行的实验表明,提出的模型在曲线下面积方面提高了7种已知和流行的DES算法的性能。此外,在真实的P2P贷款数据集中,该方法的I型错误率低于XGBoost和LightGBM,表明该方法在信用风险评估中的效率。我们从KEEL存储库中对15个不平衡数据集进行的实验表明,提出的模型在曲线下面积方面提高了7种已知和流行的DES算法的性能。此外,在真实的P2P贷款数据集中,该方法的I型错误率低于XGBoost和LightGBM,表明该方法在信用风险评估中的效率。

更新日期:2020-09-20
down
wechat
bug