当前位置: X-MOL 学术J. Syst. Sci. Complex. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Ensemble Tree Classifier for Highly Imbalanced Data Classification
Journal of Systems Science and Complexity ( IF 2.1 ) Pub Date : 2021-08-26 , DOI: 10.1007/s11424-021-1038-8
Peibei Shi 1 , Zhong Wang 1
Affiliation  

The performance of traditional imbalanced classification algorithms is degraded when dealing with highly imbalanced data. How to deal with highly imbalanced data is a difficult problem. In this paper, the authors propose an ensemble tree classifier for highly imbalanced data classification. The ensemble tree classifier is constructed with a complete binary tree structure. A mathematical model is established based on the features and classification performance of the classifier, and it is proven that the model parameters of the ensemble classifier can be solved by calculation. First, the AdaBoost method is used as the benchmark classifier to construct the tree structure model. Then, the classification cost of the model is calculated, and the quantitative mathematical description between the cost and features of the ensemble tree classifier model is obtained. Then, the cost of the classification model is transformed into an optimization problem, and the parameters of the integrated tree classifier are given through theoretical derivation. This approach is tested on several highly imbalanced datasets in different fields and takes the AUC (area under the curve) and F-measure as evaluation criteria. Compared with the traditional imbalanced classification algorithm, the ensemble tree classifier has better classification performance.



中文翻译:

用于高度不平衡数据分类的集成树分类器

在处理高度不平衡的数据时,传统不平衡分类算法的性能会有所下降。如何处理高度不平衡的数据是一个难题。在本文中,作者提出了一种用于高度不平衡数据分类的集成树分类器。集成树分类器是用完整的二叉树结构构建的。根据分类器的特征和分类性能建立数学模型,证明集成分类器的模型参数可以通过计算求解。首先,使用AdaBoost方法作为基准分类器来构建树结构模型。然后,计算模型的分类成本,得到集成树分类器模型代价与特征之间的定量数学描述。然后将分类模型的代价转化为优化问题,通过理论推导给出集成树分类器的参数。这种方法在不同领域的几个高度不平衡的数据集上进行了测试,并以 AUC(曲线下面积)和 F-measure 作为评估标准。与传统的不平衡分类算法相比,集成树分类器具有更好的分类性能。这种方法在不同领域的几个高度不平衡的数据集上进行了测试,并以 AUC(曲线下面积)和 F-measure 作为评估标准。与传统的不平衡分类算法相比,集成树分类器具有更好的分类性能。这种方法在不同领域的几个高度不平衡的数据集上进行了测试,并以 AUC(曲线下面积)和 F-measure 作为评估标准。与传统的不平衡分类算法相比,集成树分类器具有更好的分类性能。

更新日期:2021-08-26
down
wechat
bug