Optimization of SMOTE for imbalanced data based on AdaRBFNN and hybrid metaheuristics,Intelligent Data Analysis

当前位置： X-MOL 学术 › Intell. Data Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Optimization of SMOTE for imbalanced data based on AdaRBFNN and hybrid metaheuristics
Intelligent Data Analysis ( IF 1.7 ) Pub Date : 2021-04-20 , DOI: 10.3233/ida-205176
Zicheng Wang , Yanrui Sun

Oversampling ratio N and the minority class’ nearest neighboring number k are key hyperparameters of synthetic minority oversampling technique (SMOTE) to reconstruct the class distribution of dataset. No optimal default value exists there. Therefore, it is of necessity to discuss the influence of the output dataset on the classification performance when SMOTE adopts various hyperparameter combinations. In this paper, we propose a hyperparameter optimization algorithm for imbalanced data. By iterating to find reasonable N and k for SMOTE, so as to build a balanced and high-quality dataset. As a result, a model with outstanding performance and strong generalization ability is trained, thus effectively solving imbalanced classification. The proposed algorithm is based on the hybridization of simulated annealing mechanism (SA) and particle swarm optimization algorithm (PSO). In the optimization, Cohen’s Kappa is used to construct the fitness function, and AdaRBFNN, a new classifier, is integrated by multiple trained RBF neural networks based on AdaBoost algorithm. Kappa of each generation is calculated according to the classification results, so as to evaluate the quality of candidate solution. Experiments are conducted on seven groups of KEEL datasets. Results show that the proposed algorithm delivers excellent performance and can significantly improve the classification accuracy of the minority class.

中文翻译：

基于AdaRBFNN和混合元启发式算法的不平衡数据SMOTE优化。

过采样率N和少数类的最近邻数k是合成少数过采样技术（SMOTE）重建数据集类别分布的关键超参数。那里没有最佳默认值。因此，当SMOTE采用各种超参数组合时，有必要讨论输出数据集对分类性能的影响。在本文中，我们提出了一种针对不平衡数据的超参数优化算法。通过迭代为SMOTE找到合理的N和k，从而建立一个平衡且高质量的数据集。结果，训练了具有出色性能和强大泛化能力的模型，从而有效地解决了不平衡分类问题。该算法基于模拟退火机制（SA）和粒子群优化算法（PSO）的混合。在优化过程中，使用Cohen的Kappa构造适应度函数，并通过基于AdaBoost算法的多个经过训练的RBF神经网络集成了新的分类器AdaRBFNN。根据分类结果计算每一代的kappa，以评估候选解的质量。对七组KEEL数据集进行了实验。结果表明，该算法具有良好的性能，可以显着提高少数族裔分类的准确性。通过基于AdaBoost算法的多个训练有素的RBF神经网络进行集成。根据分类结果计算每一代的kappa，以评估候选解的质量。对七组KEEL数据集进行了实验。结果表明，该算法具有良好的性能，可以显着提高少数族裔分类的准确性。通过基于AdaBoost算法的多个训练有素的RBF神经网络进行集成。根据分类结果计算每一代的kappa，以评估候选解的质量。对七组KEEL数据集进行了实验。结果表明，该算法具有良好的性能，可以显着提高少数族裔分类的准确性。

更新日期：2021-04-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>