An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling,Expert Systems with Applications

当前位置： X-MOL 学术 › Expert Syst. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling
Expert Systems with Applications ( IF 7.5 ) Pub Date : 2020-06-25 , DOI: 10.1016/j.eswa.2020.113660
Xin Gao , Bing Ren , Hao Zhang , Bohao Sun , Junliang Li , Jianhang Xu , Yang He , Kangsheng Li

In many real-world applications classification problems suffer from class-imbalance. The classification methods for imbalanced data with only data processing or algorithm improvement cannot get satisfied classification performance of the minority class. This paper proposes an ensemble classification method based on model dynamic selection driven by data partition hybrid sampling for imbalanced data. The method includes two core components: the generation of balanced datasets and the dynamic selection of classification models. At the data level a data partition hybrid sampling (DPHS) method is proposed to balance datasets. In particular the data space is divided into four regions according to the majority class proportion in minority class neighborhoods. Then we present a boundary minority class weighted over-sampling (BMW-SMOTE) method where the weight of each minority class instance is calculated by the ratio between the majority class proportion in the neighborhood of the current instance and the sum of all these proportions. The number of synthetic instances is determined by the weight. At the algorithm level we present a model dynamic selection (MDS) strategy. Three ensemble learning models are built. Among them the local regions reinforce and weaken model adopts the balanced dataset obtained by proposed DPHS method for training to strengthen the identification of test instances on the boundary and appropriately weakens the dense distribution of majority class. The model for each test instance is selected adaptively according to the imbalance degree of its neighbors. The experimental results show that the proposed method outperforms typical imbalanced classification methods for F-measure and G-mean.

中文翻译：

数据分区混合采样驱动的基于模型动态选择的整体不平衡分类方法

在许多实际应用中，分类问题遭受类不平衡的困扰。仅通过数据处理或算法改进的不平衡数据的分类方法无法获得少数类的满意分类性能。提出了一种基于模型动态选择的集成分类方法，该模型由数据分区混合采样驱动，用于不平衡数据。该方法包括两个核心组成部分：平衡数据集的生成和分类模型的动态选择。在数据级别，提出了一种数据分区混合采样（DPHS）方法来平衡数据集。特别是，根据少数族裔社区中的多数族比例将数据空间划分为四个区域。然后，我们提出了一种边界少数族裔加权过采样（BMW-SMOTE）方法，其中，每个少数族裔实例的权重是通过当前实例附近的多数类比例与所有这些比例之和之间的比率来计算的。合成实例的数量由权重决定。在算法级别，我们提出了一种模型动态选择（MDS）策略。建立了三个整体学习模型。其中局部增强和弱化模型采用DPHS方法获得的平衡数据集进行训练，以加强边界上测试实例的识别，并适当削弱多数类的密集分布。根据每个测试实例的邻居的不平衡程度来自适应地选择其模型。

更新日期：2020-06-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11