当前位置: X-MOL 学术Adv. Eng. Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
New hybrid data mining model for credit scoring based on feature selection algorithm and ensemble classifiers
Advanced Engineering Informatics ( IF 8.0 ) Pub Date : 2020-06-12 , DOI: 10.1016/j.aei.2020.101130
Jasmina Nalić , Goran Martinović , Drago Žagar

The aim of this paper is to propose a new hybrid data mining model based on combination of various feature selection and ensemble learning classification algorithms, in order to support decision making process. The model is built through several stages. In the first stage, initial dataset is preprocessed and apart of applying different preprocessing techniques, we paid a great attention to the feature selection. Five different feature selection algorithms were applied and their results, based on ROC and accuracy measures of logistic regression algorithm, were combined based on different voting types. We also proposed a new voting method, called if_any, that outperformed all other voting methods, as well as a single feature selection algorithm's results. In the next stage, a four different classification algorithms, including generalized linear model, support vector machine, naive Bayes and decision tree, were performed based on dataset obtained in the feature selection process. These classifiers were combined in eight different ensemble models using soft voting method. Using the real dataset, the experimental results show that hybrid model that is based on features selected by if_any voting method and ensemble GLM + DT model performs the highest performance and outperforms all other ensemble and single classifier models.



中文翻译:

基于特征选择算法和集成分类器的信用评分混合数据挖掘新模型

本文的目的是基于各种特征选择和集成学习分类算法的组合,提出一种新的混合数据挖掘模型,以支持决策过程。该模型分几个阶段构建。在第一阶段,对初始数据集进行预处理,除了应用不同的预处理技术外,我们非常关注特征选择。应用了五种不同的特征选择算法,并基于ROC和logistic回归算法的准确性度量,将结果基于不同的投票类型进行了组合。我们还提出了一种新的投票方法,称为if_any,其性能优于所有其他投票方法以及单个特征选择算法的结果。在下一阶段,基于特征选择过程中获得的数据集,执行了四种不同的分类算法,包括广义线性模型,支持向量机,朴素贝叶斯和决策树。使用软投票方法将这些分类器组合成八个不同的集成模型。使用真实的数据集,实验结果表明,基于通过if_any投票方法选择的特征和集合GLM + DT模型选择的混合模型表现出最高的性能,并且胜过所有其他集合和单个分类器模型。

更新日期:2020-06-12
down
wechat
bug