当前位置: X-MOL 学术Neural Comput. & Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning
Neural Computing and Applications ( IF 4.5 ) Pub Date : 2021-06-21 , DOI: 10.1007/s00521-021-06198-x
Mohammed H. IBRAHIM

In many real-world problems, the datasets are imbalanced when the samples of majority classes are much greater than the samples of minority classes. In general, machine learning and data mining classification algorithms perform poorly on imbalanced datasets. In recent years, various oversampling techniques have been developed in the literature to solve the class imbalance problem. Unfortunately, few of the oversampling techniques can be spread to tackle the relationship between the classes and use the correlation between attributes. Moreover, in most cases, the existing oversampling techniques do not handle multi-class imbalanced datasets. To this end, in this paper, a simple but effective outlier detection-based oversampling technique (ODBOT) is proposed to handle the multi-class imbalance problem. In the proposed ODBOT, the outlier samples are detected by clustering within the minority class(es), and then, the synthetic samples are generated by consideration of these outlier samples. The proposed ODBOT generates very efficient and consistent synthetic samples for the minority class(es) by analyzing well the dissimilarity relationships among attribute values of all classes. Moreover, ODBOT can reduce the risk of the overlapping problem among different class regions and can build a better classification model. The performance of the proposed ODBOT is evaluated with extensive experiments using commonly used 60 imbalanced datasets and five classification algorithms. The experimental results show that the proposed ODBOT oversampling technique consistently outperformed the other common and state-of-the-art techniques in terms of various evaluation criteria.



中文翻译:

ODBOT:用于不平衡数据集学习的基于异常值检测的过采样技术

在许多实际问题中,当多数类的样本远大于少数类的样本时,数据集是不平衡的。一般来说,机器学习和数据挖掘分类算法在不平衡的数据集上表现不佳。近年来,文献中已经开发了各种过采样技术来解决类不平衡问题。不幸的是,很少有过采样技术可以传播来处理类之间的关系并使用属性之间的相关性。此外,在大多数情况下,现有的过采样技术不能处理多类不平衡数据集。为此,本文提出了一种简单但有效的基于异常值检测的过采样技术(ODBOT)来处理多类不平衡问题。在提议的 ODBOT 中,通过在少数类中聚类检测离群样本,然后通过考虑这些离群样本生成合成样本。所提出的ODBOT 通过很好地分析所有类的属性值之间的不相似关系,为少数类生成非常有效和一致的合成样本。而且,ODBOT 可以降低不同类区域之间重叠问题的风险,可以构建更好的分类模型。使用常用的 60 个不平衡数据集和五种分类算法,通过广泛的实验评估了所提出的 ODBOT 的性能。实验结果表明,所提出的 ODBOT 过采样技术在各种评估标准方面始终优于其他常见和最先进的技术。

更新日期:2021-06-21
down
wechat
bug