Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis.,Computer Assisted Surgery

当前位置： X-MOL 学术 › Comput. Assist. Surg. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis.
Computer Assisted Surgery ( IF 2.1 ) Pub Date : 2019-08-12 , DOI: 10.1080/24699322.2019.1649074
Jue Zhang _{1,

2} , Li Chen ₁

Affiliation

To overcome the two-class imbalanced classification problem existing in the diagnosis of breast cancer, a hybrid of Random Over Sampling Example, K-means and Support vector machine (RK-SVM) model is proposed which is based on sample selection. Random Over Sampling Example (ROSE) is utilized to balance the dataset and further improve the diagnosis accuracy by Support Vector Machine (SVM). As there is one different sample selection factor via clustering that encourages selecting the samples near the class boundary. The purpose of clustering here is to reduce the risk of removing useful samples and improve the efficiency of sample selection. To test the performance of the new hybrid classifier, it is implemented on breast cancer datasets and the other three datasets from the University of California Irvine (UCI) machine learning repository, which are commonly used datasets in class imbalanced learning. The extensive experimental results show that our proposed hybrid method outperforms most of the competitive algorithms in term of G-mean and accuracy indices. Additionally, experimental results show that this method also performs superiorly for binary problems.

中文翻译：

基于聚类的欠采样与随机过采样示例和支持向量机，用于乳腺癌诊断的不平衡分类。

为了克服乳腺癌诊断中存在的两类不平衡分类问题，提出了基于样本选择的随机过采样示例，K均值和支持向量机（RK-SVM）模型的混合体。随机超采样示例（ROSE）用于平衡数据集，并通过支持向量机（SVM）进一步提高诊断准确性。由于存在通过聚类的一种不同的样本选择因子，因此鼓励在类边界附近选择样本。此处进行聚类的目的是减少删除有用样本的风险并提高样本选择的效率。为了测试新型混合分类器的性能，该方法在乳腺癌数据集和来自加州大学尔湾分校（UCI）机器学习存储库的其他三个数据集上实现，是班级不平衡学习中常用的数据集。大量的实验结果表明，在G均值和准确性指标方面，我们提出的混合方法优于大多数竞争算法。此外，实验结果表明，该方法在二元问题上也具有优越的性能。

更新日期：2019-08-12

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>