IA-SUWO: An Improving Adaptive semi-unsupervised weighted oversampling for imbalanced classification problems,Knowledge-Based Systems

当前位置： X-MOL 学术 › Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

IA-SUWO: An Improving Adaptive semi-unsupervised weighted oversampling for imbalanced classification problems
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-06-10 , DOI: 10.1016/j.knosys.2020.106116
Jianan Wei , Haisong Huang , Liguo Yao , Yao Hu , Qingsong Fan , Dong Huang

As the essence of machine learning, classification is widely used in real life, however, imbalanced data has brought great challenges to classification problems. This is because standard classifiers tend to favor the majority instances and ignore the minority instances. The new oversampling algorithms (e.g. A-SUWO) based on the improving majority weighted minority oversampling (IMWMO) method assign weights through the Euclidean distances from majority instances to hard-to-learn minority instances, and then guide the synthesis of minority samples according to the weights to address the offset of the classification hyperplanes. A-SUWO has achieved better results than traditional oversampling algorithms (e.g. SMOTE and MWMOTE, etc.), when its parameters are well adjusted. However, A-SUWO may give minority training samples inappropriate weights in some irregularly distributed scenarios and make learning tasks even more harder. Additionally, A-SUWO’s knn synthesizing method may not obtain wider and more effective instances. Therefore, we propose an improving adaptive semi-unsupervised weighted oversampling (IA-SUWO) technique to address the imbalanced classification problems more effectively. The improvement of IA-SUWO mainly focuses on the following two aspects: (1) comprehensively considering the least squares support numerical spectrum values and the IMWMO method to assign weights to minority instances, and (2) synthesizing new instances using the k* information nearest neighbors (k*INN) method. IA-SUWO aims to maximize the probability that all important minority samples will be drawn and generates more efficient (more scattered) boundary instances. Results demonstrate that IA-SUWO achieves significantly better results in most datasets compared with other 10 oversampling algorithms and 2 ensemble algorithms.

中文翻译：

IA-SUWO：针对不平衡分类问题的改进的自适应半无监督加权过采样

分类作为机器学习的本质，在现实生活中被广泛使用，然而，数据不平衡给分类问题带来了巨大挑战。这是因为标准分类器倾向于偏爱多数实例，而忽略少数实例。新的过采样算法（例如A-SUWO）基于改进的多数加权少数过采样（IMWMO）方法，通过从多数实例到难以学习的少数实例的欧几里得距离分配权重，然后根据权重以解决分类超平面的偏移。如果参数调整得当，A-SUWO的效果将优于传统的过采样算法（例如SMOTE和MWMOTE等）。然而，在某些不规则分布的情况下，A-SUWO可能会给少数族裔训练样本以不适当的权重，并使学习任务更加困难。此外，A-SUWO的knn合成方法可能无法获得更广泛，更有效的实例。因此，我们提出了一种改进的自适应半无监督加权过采样（IA-SUWO）技术，以更有效地解决不平衡分类问题。IA-SUWO的改进主要集中在以下两个方面：（1）综合考虑最小二乘支持数值谱值和IMWMO方法为少数实例分配权重，以及（2）使用k *信息最近邻（k * INN）方法合成新实例。IA-SUWO旨在最大程度地抽取所有重要的少数族裔样本并生成更有效（更分散）的边界实例的可能性。结果表明，与其他10种过采样算法和2种集成算法相比，IA-SUWO在大多数数据集中均取得了明显更好的结果。

更新日期：2020-06-22

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11