Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2019-10-16 , DOI: 10.1016/j.knosys.2019.105118 Jin Xiao , Xu Zhou , Yu Zhong , Ling Xie , Xin Gu , Dunhu Liu
Only a few customers can be labeled in realistic credit-scoring problems, while many other customers cannot. Further, satisfactory performance is difficult, as traditional supervised learning methods can only use labeled samples to build credit-scoring models. Semi-supervised learning (SSL) can use both labeled and unlabeled samples to solve this problem, but existing credit-scoring research has primarily constructed single semi-supervised models. This study introduces SSL, cost-sensitive learning, a group method of data handling (GMDH), and an ensemble learning technique to propose a GMDH-based cost-sensitive semi-supervised selective ensemble (GCSSE) model. This involves two stages: (1)First, train an ensemble model composed of base classifiers on the initial training set with class labels, use it to selectively label the samples from the dataset without class labels, add them with their predicted labels to the training set, and update the base classifiers on the new training set; (2)Second, classify and the test set using the respective trained base classifiers, and construct a cost-sensitive GMDH neural network to obtain the selective ensemble classification results for the test set. Experimental comparisons of five public customer credit score datasets and an empirical analysis of a real customer credit score dataset suggest that this model exhibits the best overall credit-scoring performance compared with one supervised ensemble model and three semi-supervised ensemble models.
中文翻译:
成本敏感的半监督选择性集成模型用于客户信用评分
在现实的信用评分问题中,只有少数客户可以被标记,而其他许多客户则不能。此外,由于传统的监督学习方法只能使用标记的样本来建立信用评分模型,因此很难获得令人满意的性能。半监督学习(SSL)可以使用标记的样本和未标记的样本来解决此问题,但是现有的信用评分研究主要构建了单个半监督模型。本研究介绍了SSL,成本敏感型学习,数据处理的分组方法(GMDH)和集成学习技术,以提出基于GMDH的成本敏感型半监督选择性集成(GCSSE)模型。这涉及两个阶段:(1)首先,训练由 基于初始训练集的分类器 使用类别标签,使用它来选择性地标记数据集中的样本 没有班级标签的人,请将其带有预测标签的人添加到训练集中,然后更新 基于新训练集的分类器;(2)第二,分类并使用各自训练有素的基本分类器对测试集进行分类,并构建一个成本敏感的GMDH神经网络,以获得测试集的选择性整体分类结果。对五个公共客户信用评分数据集进行的实验比较和对真实客户信用评分数据集的经验分析表明,与一个监督集成模型和三个半监督集成模型相比,该模型展现出最佳的总体信用评分性能。