当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
LR-SMOTE – An improved unbalanced data set oversampling based on K-means and SVM
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-04-02 , DOI: 10.1016/j.knosys.2020.105845
X.W. Liang , A.P. Jiang , T. Li , Y.Y. Xue , G.T. Wang

Machine learning classification algorithms are currently widely used. One of the main problems faced by classification algorithms is the problem of unbalanced data sets. Classification algorithms are not sensitive to unbalanced data sets, therefore, it is difficult to classify unbalanced data sets. There is also a problem of unbalanced data categories in the field of loose particle detection of sealed electronic components. The signals generated by internal components are always more than the signals generated by loose particles, which easily leads to misjudgment in classification. To classify unbalanced data sets more accurately, in this paper, based on the traditional oversampling SMOTE algorithm, the LR-SMOTE algorithm is proposed to make the newly generated samples close to the sample center, avoid generating outlier samples or changing the distribution of data sets. Experiments were carried out on four sets of UCI public data sets and six sets of self-built data sets. Unmodified data sets balanced by LR-SMOTE and SMOTE algorithms used random forest algorithm and support vector machine algorithm respectively. The experimental results show that the LR-SMOTE has better performance than the SMOTE algorithm in terms of G-means value, F-measure value and AUC.



中文翻译:

LR-SMOTE –基于K均值和SVM的改进的不平衡数据集过采样

机器学习分类算法目前被广泛使用。分类算法面临的主要问题之一是数据集不平衡的问题。分类算法对不平衡数据集不敏感,因此很难对不平衡数据集进行分类。在密封的电子部件的松散颗粒检测领域中,还存在数据类别不平衡的问题。内部组件产生的信号总是比散落颗粒产生的信号更多,这很容易导致分类错误。为了更准确地对不平衡数据集进行分类,本文在传统的过采样SMOTE算法的基础上,提出了LR-SMOTE算法,以使新生成的样本更靠近样本中心,避免生成异常样本或更改数据集的分布。对四套UCI公共数据集和六套自建数据集进行了实验。通过LR-SMOTE和SMOTE算法平衡的未修改数据集分别使用随机森林算法和支持向量机算法。实验结果表明,LR-SMOTE在G均值,F度量值和AUC方面比SMOTE算法具有更好的性能。

更新日期:2020-04-03
down
wechat
bug