Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification,Discrete Dynamics in Nature and Society

当前位置： X-MOL 学术 › Discret. Dyn. Nat. Soc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
Discrete Dynamics in Nature and Society ( IF 1.3 ) Pub Date : 2021-05-05 , DOI: 10.1155/2021/6647557
Chunye Wu ₁ , Nan Wang ₁ , Yu Wang ₁

Affiliation

Imbalanced data classification is gaining importance in data mining and machine learning. The minority class recall rate requires special treatment in fields such as medical diagnosis, information security, industry, and computer vision. This paper proposes a new strategy and algorithm based on a cost-sensitive support vector machine to improve the minority class recall rate to 1 because the misclassification of even a few samples can cause serious losses in some physical problems. In the proposed method, the modification employs a margin compensation to make the margin lopsided, enabling decision boundary drift. When the boundary reaches a certain position, the minority class samples will be more generalized to achieve the requirement of a recall rate of 1. In the experiments, the effects of different parameters on the performance of the algorithm were analyzed, and the optimal parameters for a recall rate of 1 were determined. The experimental results reveal that, for the imbalanced data classification problem, the traditional definite cost classification scheme and the models classified using the area under the receiver operating characteristic curve criterion rarely produce results such as a recall rate of 1. The new strategy can yield a minority recall of 1 for imbalanced data as the loss of the majority class is acceptable; moreover, it improves the -means index. The proposed algorithm provides superior performance in minority recall compared to the conventional methods. The proposed method has important practical significance in credit card fraud, medical diagnosis, and other areas.

中文翻译：

少数族群召回支持向量机模型在不均衡数据分类中的应用

数据分类失衡在数据挖掘和机器学习中变得越来越重要。少数群体的召回率需要在医学诊断，信息安全，工业和计算机视觉等领域进行特殊处理。本文提出了一种基于成本敏感的支持向量机的新策略和算法，可以将少数类别的召回率提高到1，因为即使是少数样本的错误分类也会在某些物理问题上造成严重的损失。在所提出的方法中，该修改采用余量补偿以使余量偏斜，从而能够进行决策边界漂移。当边界到达某个位置时，少数类样本将被更通用化，以达到召回率1的要求。在实验中，分析了不同参数对算法性能的影响，并确定了召回率为1的最佳参数。实验结果表明，对于不平衡的数据分类问题，传统的确定成本分类方案和使用接收器工作特性曲线准则下的面积进行分类的模型很少会产生诸如1的召回率。对于不平衡数据，少数派召回为1，因为多数类的损失是可以接受的；而且，它改善了传统的确定成本分类方案和使用接收器工作特性曲线标准下的面积进行分类的模型很少会产生诸如1的召回率的结果。对于不平衡的数据，新策略可以产生1的少数召回，因为多数损失上课是可以接受的; 而且，它改善了传统的确定成本分类方案和使用接收器工作特性曲线标准下的面积进行分类的模型很少会产生诸如1的召回率的结果。对于不平衡的数据，新策略可以产生1的少数召回，因为多数损失上课是可以接受的; 而且，它改善了-表示索引。与传统方法相比，该算法在少数召回方面提供了卓越的性能。该方法在信用卡欺诈，医疗诊断等领域具有重要的现实意义。

更新日期：2021-05-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11