当前位置: X-MOL 学术IEEE Access › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SOTB: Semi-supervised Oversampling Approach Based on Trigonal Barycenter Theory
IEEE Access ( IF 3.4 ) Pub Date : 2020-01-01 , DOI: 10.1109/access.2020.2980157
Dingxiang Liu , Shaojie Qiao , Nan Han , Tao Wu , Rui Mao , Yongqing Zhang , Chang-An Yuan , Yueqiang Xiao

The problem of classifying imbalanced data is one of the active research directions in machine learning and bioinformatics. The imbalance of data will greatly degrade the accuracy of classifiers. Good oversampling methods can improve the diversity and validity of new samples, which cannot only solve the imbalance problem of sample data, but also greatly improve the classification accuracy. In this study, we propose the trigonal barycenter theory and a semi-supervised oversampling method, called SOTB (Semi-supervised Oversampling method based on Trigonal Barycenter theory). SOBT works to: (1) construct the non-intersecting triangles based on Mahalanobis distance; (2) combine the semi-supervised sampling method with trigonal barycenter theory to oversample the positive samples, which can cope with the data imbalance problem without affecting the quality of data. Lastly, extensive experiments were conducted to verify the effectiveness of the proposed method. The results demonstrate that SOTB can improve the validity, diversity and rationality on the distribution of the newly generated samples as well as alleviate the phenomena of over-fitting which is popular in existing oversampling approaches. In particular, when compared with the state-of-the-art oversampling methods, the results show SOTB can achieve the best classification performance.

中文翻译:

SOTB:基于三角重心理论的半监督过采样方法

不平衡数据的分类问题是机器学习和生物信息学中活跃的研究方向之一。数据的不平衡会大大降低分类器的准确性。好的过采样方法可以提高新样本的多样性和有效性,不仅可以解决样本数据的不平衡问题,还可以大大提高分类精度。在这项研究中,我们提出了三角重心理论和半监督过采样方法,称为 SOTB(基于三角重心理论的半监督过采样方法)。SOBT 的工作原理是: (1) 基于马氏距离构造不相交的三角形;(2) 结合半监督采样方法和三角重心理论对正样本进行过采样,可以在不影响数据质量的情况下应对数据不平衡问题。最后,进行了大量实验以验证所提出方法的有效性。结果表明,SOTB 可以提高新生成样本分布的有效性、多样性和合理性,并缓解现有过采样方法中普遍存在的过拟合现象。特别是,与最先进的过采样方法相比,结果表明 SOTB 可以实现最佳分类性能。新生成的样本分布的多样性和合理性,以及缓解现有过采样方法中流行的过拟合现象。特别是,与最先进的过采样方法相比,结果表明 SOTB 可以实现最佳分类性能。新生成的样本分布的多样性和合理性,以及缓解现有过采样方法中流行的过拟合现象。特别是,与最先进的过采样方法相比,结果表明 SOTB 可以实现最佳分类性能。
更新日期:2020-01-01
down
wechat
bug