当前位置: X-MOL 学术Concurr. Comput. Pract. Exp. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CSS: Handling imbalanced data by improved clustering with stratified sampling
Concurrency and Computation: Practice and Experience ( IF 1.5 ) Pub Date : 2020-12-22 , DOI: 10.1002/cpe.6071
Lu Cao 1, 2 , Hong Shen 1
Affiliation  

The traditional support vector machine technique (SVM) has drawbacks in dealing with imbalanced data. To address this issue, in this paper we propose an algorithm of improved clustering with stratified sampling technique (CSS) to improve the classification performance of SVMs on imbalanced datasets. Instead of applying a single type of sampling method as used in the literature, our algorithm treats different type of classes with different sampling methods. For minority classes, the algorithm uses oversampling method by adding noise which obeys normal distribution around every support vector to generate new samples. For majority classes, samples are first divided into different clusters by applying first the improved clustering by fast search to find of density peaks (CFSFDP) to obtain latent structure information in each majority class and then stratified sampling method is applied to extract samples from each subcluster of the majority class. Moreover, we further extend this method into an ensemble classifiers that use multiple base SVM classifiers for prediction. The experimental results of classification on several imbalanced classification datasets show that our CSS is more effective than the state-of-the-art sampling methods.

中文翻译:

CSS:通过分层抽样改进聚类处理不平衡数据

传统的支持向量机技术(SVM)在处理不平衡数据方面存在缺陷。为了解决这个问题,在本文中,我们提出了一种使用分层抽样技术(CSS)改进聚类的算法,以提高 SVM 在不平衡数据集上的分类性能。我们的算法不是应用文献中使用的单一类型的采样方法,而是使用不同的采样方法处理不同类型的类。对于少数类,该算法通过在每个支持向量周围添加服从正态分布的噪声来使用过采样方法来生成新样本。对于多数类,首先通过快速搜索的改进聚类寻找密度峰值(CFSFDP)将样本分成不同的簇,以获得每个多数类的潜在结构信息,然后应用分层抽样方法从多数类的每个子簇中提取样本. 此外,我们进一步将此方法扩展到使用多个基本 SVM 分类器进行预测的集成分类器。在几个不平衡分类数据集上的分类实验结果表明,我们的 CSS 比最先进的采样方法更有效。
更新日期:2020-12-22
down
wechat
bug