当前位置: X-MOL 学术Expert Syst. Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel approach to define the local region of dynamic selection techniques in imbalanced credit scoring problems
Expert Systems with Applications ( IF 8.5 ) Pub Date : 2020-03-04 , DOI: 10.1016/j.eswa.2020.113351
Leopoldo Melo Junior , Franco Maria Nardini , Chiara Renso , Roberto Trani , Jose Antonio Macedo

Lenders, such as banks and credit card companies, use credit scoring models to evaluate the potential risk posed by lending money to customers, and therefore to mitigate losses due to bad credit. The profitability of the banks thus highly depends on the models used to decide on the customer’s loans. State-of-the-art credit scoring models are based on machine learning and statistical methods. One of the major problems of this field is that lenders often deal with imbalanced datasets that usually contain many paid loans but very few not paid ones (called defaults). Recently, dynamic selection methods combined with ensemble methods and preprocessing techniques have been evaluated to improve classification models in imbalanced datasets presenting advantages over the static machine learning methods. In a dynamic selection technique, samples in the neighborhood of each query sample are used to compute the local competence of each base classifier. Then, the technique selects only competent classifiers to predict the query sample. In this paper, we evaluate the suitability of dynamic selection techniques for credit scoring problem, and we present Reduced Minority k-Nearest Neighbors (RMkNN), an approach that enhances state of the art in defining the local region of dynamic selection techniques for imbalanced credit scoring datasets. This proposed technique has a superior prediction performance in imbalanced credit scoring datasets compared to state of the art. Furthermore, RMkNN does not need any preprocessing or sampling method to generate the dynamic selection dataset (called DSEL). Additionally, we observe an equivalence between dynamic selection and static selection classification. We conduct a comprehensive evaluation of the proposed technique against state-of-the-art competitors on six real-world public datasets and one private one. Experiments show that RMkNN improves the classification performance of the evaluated datasets regarding AUC, balanced accuracy, H-measure, G-mean, F-measure, and Recall.



中文翻译:

一种在不平衡信用评分问题中定义动态选择技术局部区域的新颖方法

诸如银行和信用卡公司之类的贷方使用信用评分模型来评估借钱给客户带来的潜在风险,从而减轻不良信用所造成的损失。因此,银行的盈利能力在很大程度上取决于用于决定客户贷款的模型。最新的信用评分模型基于机器学习和统计方法。该领域的主要问题之一是,贷方经常处理不平衡的数据集,该数据集通常包含许多有偿贷款,但很少有未偿还贷款(称为违约)。)。最近,已经评估了与集成方法和预处理技术相结合的动态选择方法,以改进不平衡数据集中的分类模型,与静态机器学习方法相比,该模型具有优势。在动态选择技术中,每个查询样本附近的样本用于计算每个基本分类器的局部能力。然后,该技术仅选择能胜任的分类器来预测查询样本。在本文中,我们评估了动态选择技术对信用评分问题的适用性,并提出了减少少数族裔k最近邻(RMkNN),这是一种增强为不平衡信用定义动态选择技术的本地区域的最新技术评分数据集。与现有技术相比,该技术在不平衡信用评分数据集中具有出色的预测性能。此外,RMkNN不需要任何预处理或采样方法即可生成动态选择数据集(称为DSEL)。此外,我们观察到动态选择和静态选择分类之间的等效性。我们在六个现实世界的公共数据集和一个私有数据集上,针对最先进的竞争对手对提议的技术进行了全面评估。实验表明,RMkNN改进了评估数据集在AUC,平衡精度,H度量,G均值,F度量和召回率方面的分类性能。RMkNN不需要任何预处理或采样方法即可生成动态选择数据集(称为DSEL)。此外,我们观察到动态选择和静态选择分类之间的等效性。我们在六个现实世界的公共数据集和一个私有数据集上,针对最先进的竞争对手对提议的技术进行了全面评估。实验表明,RMkNN改进了评估数据集在AUC,平衡精度,H度量,G均值,F度量和召回率方面的分类性能。RMkNN不需要任何预处理或采样方法即可生成动态选择数据集(称为DSEL)。此外,我们观察到动态选择和静态选择分类之间的等效性。我们在六个现实世界的公共数据集和一个私有数据集上,针对最先进的竞争对手对提议的技术进行了全面评估。实验表明,RMkNN改进了评估数据集在AUC,平衡精度,H度量,G均值,F度量和召回率方面的分类性能。

更新日期:2020-03-04
down
wechat
bug