当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A new approach in reject inference of using ensemble learning based on global semi-supervised framework
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2020-04-07 , DOI: 10.1016/j.future.2020.03.047
Yan Liu , Xiner Li , Zaimei Zhang

Credit scoring in online Peer-to-Peer (P2P) lending faces a huge challenge, which is the credit scoring models discard rejected applicants. This selective discarding leads to bias in the parameters of the models and ultimately affects the performance of credit evaluation. One approach for handling this problem is to adopt reject inference, which is a technique that infer the status of rejected samples and incorporate the results into credit scoring models. The most popular practice of reject inference is to use a credit scoring model that is only built on accepted samples to directly predict the status of rejected samples. However, the distribution of accepted samples in online P2P lending is different from rejected samples. We propose SSL-EC3, a global semi-supervised framework that merges multiple classifiers and clustering algorithms together to make better use of the information of rejected samples. It uses multiple unsupervised models (clustering algorithms) to explore the internal relationships of all samples, and then incorporates the information into the ensemble of supervised models (classifiers) to help correct initial classification results of rejected samples. In addition, we try to use a dynamic ensemble selection (DES) to select the appropriate ensemble of classifiers for each sample to be classified. Experimental results on the real data sets demonstrate the benefits of the proposed methods over conventional methods based on the reject inference.



中文翻译:

基于全局半监督框架的集成学习拒绝推理新方法

在线对等(P2P)贷款中的信用评分面临着巨大的挑战,这就是信用评分模型会丢弃被拒绝的申请人。这种选择性丢弃会导致模型参数出现偏差,并最终影响信用评估的性能。解决此问题的一种方法是采用拒绝推断,这是一种推断拒绝样本状态并将结果纳入信用评分模型的技术。拒绝推理最流行的做法是使用仅基于接受样本建立的信用评分模型直接预测拒绝样本的状态。但是,在线P2P借贷中接受的样本的分布与拒绝的样本不同。我们建议使用SSL-EC3,一个全球半监督框架,该框架将多个分类器和聚类算法合并在一起,以更好地利用被拒绝样本的信息。它使用多个无监督模型(聚类算法)来探索所有样本的内部关系,然后将信息合并到有监督模型(分类器)的集合中,以帮助纠正被拒绝样本的初始分类结果。此外,我们尝试使用动态集合选择(DES)为每个要分类的样本选择合适的分类器集合。实际数据集上的实验结果表明,与基于拒绝推理的常规方法相比,该方法具有更大的优势。它使用多个无监督模型(聚类算法)来探索所有样本的内部关系,然后将信息合并到有监督模型(分类器)的集合中,以帮助纠正被拒绝样本的初始分类结果。此外,我们尝试使用动态集合选择(DES)为要分类的每个样本选择合适的分类器集合。实际数据集上的实验结果证明了该方法相对于基于拒绝推理的传统方法的好处。它使用多个无监督模型(聚类算法)来探索所有样本的内部关系,然后将信息合并到有监督模型(分类器)的集合中,以帮助纠正被拒绝样本的初始分类结果。此外,我们尝试使用动态集合选择(DES)为要分类的每个样本选择合适的分类器集合。实际数据集上的实验结果表明,与基于拒绝推理的常规方法相比,该方法具有更大的优势。我们尝试使用动态集合选择(DES)为要分类的每个样本选择合适的分类器集合。实际数据集上的实验结果表明,与基于拒绝推理的常规方法相比,该方法具有更大的优势。我们尝试使用动态集合选择(DES)为要分类的每个样本选择合适的分类器集合。实际数据集上的实验结果表明,与基于拒绝推理的常规方法相比,该方法具有更大的优势。

更新日期:2020-04-07
down
wechat
bug