当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A new hybrid ensemble model with voting-based outlier detection and balanced sampling for credit scoring
Expert Systems with Applications ( IF 8.5 ) Pub Date : 2021-02-20 , DOI: 10.1016/j.eswa.2021.114744
Wenyu Zhang , Dongqi Yang , Shuai Zhang

The credit scoring system has been revolutionized with the development of the financial system and has received increasing attention from the academia and industry. Artificial intelligence technology has reshaped credit scoring through predictive classification. In this study, a new hybrid ensemble model with voting-based outlier detection and balanced sampling is proposed to achieve superior predictive power for credit scoring. To avoid noise-filled data from misleading the classifier training, a new voting-based outlier detection method is proposed to enhance the classic outlier detection algorithms with the weighted voting mechanism and boost the outlier scores into the training set to form an outlier-adapted training set. To reduce the information loss caused by under-sampling when dealing with imbalanced data, a new bagging-based balanced sampling method is proposed to enhance the traditional under-sampling methods with the bagging strategy to obtain a balanced training set. To further improve the performance of the proposed model, a stacking-based ensemble modeling method is proposed to first perform parametrical optimization and then construct the stacking-based multi-stage ensemble model. Five datasets from the UC Irvine machine learning repository and five evaluation indicators were adopted to evaluate the model performance. The experimental results indicate the superior performance of the proposed model and prove its robustness and effectiveness.



中文翻译:

一种新的混合集成模型,具有基于投票的异常值检测和均衡采样的信用评分

信用评分系统随着金融体系的发展而发生了革命性变化,受到了学术界和行业的越来越多的关注。人工智能技术通过预测分类重塑了信用评分。在这项研究中,提出了一种新的基于投票的异常检测和平衡采样的混合集成模型,以实现较高的信用评分预测能力。为了避免噪声充斥的数据误导分类器训练,提出了一种新的基于投票的离群值检测方法,以加权投票机制增强经典离群值检测算法,并将离群值提高到训练集中,形成离群适应的训练。放。为了减少在处理不平衡数据时由于采样不足而导致的信息丢失,提出了一种新的基于套袋的均衡采样方法,通过套袋策略增强了传统的欠采样方法,从而获得了均衡的训练集。为了进一步提高模型的性能,提出了基于堆叠的集成建模方法,首先进行参数优化,然后构建基于堆叠的多阶段集成模型。来自UC Irvine机器学习存储库的五个数据集和五个评估指标用于评估模型性能。实验结果表明了该模型的优越性能,并证明了其鲁棒性和有效性。提出了一种基于堆栈的集成建模方法,首先进行参数优化,然后构建基于堆栈的多阶段集成模型。来自UC Irvine机器学习存储库的五个数据集和五个评估指标用于评估模型性能。实验结果表明了该模型的优越性能,并证明了其鲁棒性和有效性。提出了一种基于堆栈的集成建模方法,首先进行参数优化,然后构建基于堆栈的多阶段集成模型。来自UC Irvine机器学习存储库的五个数据集和五个评估指标用于评估模型性能。实验结果表明了该模型的优越性能,并证明了其鲁棒性和有效性。

更新日期:2021-03-04
down
wechat
bug