当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The ensemble of density-sensitive SVDD classifier based on maximum soft margin for imbalanced datasets
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2021-02-26 , DOI: 10.1016/j.knosys.2021.106897
Xinmin Tao , Wei Chen , Xiangke Li , Xiaohan Zhang , Yetong Li , Jie Guo

Imbalanced problems have recently attracted much attention due to their prevalence in numerous domains of great importance to the data mining community. However, conventional bi-class classification approaches, e.g., Support vector machine (SVM), generally perform poorly on imbalanced datasets as they are originally designed to generalize from the training data, and pay little attention to the minority class. In the paper, we extend traditional support vector domain description (SVDD) and propose a novel density-sensitive SVDD classifier based on maximum soft margin (DSMSM-SVDD) for imbalanced datasets. In the proposed approach, the relative density-based penalty weights are incorporated into the optimization objective function to represent the importance of the data samples. Through optimizing the objective function with the relative density-based penalty weights, the training majority samples with high relative densities are more likely to lie inside the hypersphere, thus eliminating noise effects on traditional SVDD. In addition, to make full use of the minority class samples to refine the boundary in training, the maximum soft margin regularization term is also introduced in the proposed technique inspired by the idea of maximizing soft margin of traditional SVM. This method allows the optimal domain description boundary to more skew toward the minority class than traditional SVDD and thus improves the classification accuracy. Eventually, AdaBoost ensemble version of DSMSM-SVDD is developed so as to further improve the generalization performance and stability in dealing with imbalanced datasets. The extensive experimental results on various datasets demonstrate that the proposed approach significantly outperforms other existing algorithms when dealing with the imbalanced classification problems in terms of G-Mean, F-Measure and AUC performance measures.



中文翻译:

基于最大软裕量的不平衡数据集的密度敏感型SVDD分类器的整体

最近,由于不平衡问题在数据挖掘社区中极为重要的众多领域中的普遍性,引起了人们的广泛关注。但是,传统的双类别分类方法(例如,支持向量机(SVM))通常在不平衡数据集上表现不佳,因为它们最初是根据训练数据进行泛化而设计的,很少关注少数类。在本文中,我们扩展了传统的支持向量域描述(SVDD),并针对不平衡数据集提出了一种基于最大软裕量(DSMSM-SVDD)的新型密度敏感型SVDD分类器。在提出的方法中,将基于相对密度的惩罚权重合并到优化目标函数中,以表示数据样本的重要性。通过使用基于相对密度的惩罚权重优化目标函数,具有较高相对密度的训练多数样本更有可能位于超球体内,从而消除了对传统SVDD的噪声影响。另外,为了充分利用少数类样本来完善训练中的边界,该技术还引入了最大软裕度正则化项,这是受传统SVM最大化软裕度的启发。与传统的SVDD相比,该方法使最佳域描述边界更倾向于少数类,从而提高了分类精度。最终,开发了DSMSM-SVDD的AdaBoost集成版本,以进一步提高在处理不平衡数据集时的泛化性能和稳定性。

更新日期:2021-03-03
down
wechat
bug