当前位置: X-MOL 学术IETE J. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Learning Model for Breast Cancer Data Analysis Using Triplet Feature Selection Algorithm
IETE Journal of Research ( IF 1.3 ) Pub Date : 2021-08-17 , DOI: 10.1080/03772063.2021.1963861
Dhivya P. 1 , Bazilabanu A. 1 , Thirumalaikolundusubramanian Ponniah 2
Affiliation  

The machine learning techniques can be used for clinical investigations in breast cancer diagnosis. The researchers investigated various machine learning algorithms, such as Support Vector Machine, Naïve Bayes, Logistic Regression (LR), Random Forest, Decision Tree and K Nearest Neighbor to diagnose the disease. Early detection of breast cancer cells from the features is essential. Feature selection is the process of reducing the input features to improve the performance of the model. This research aims to increase the accuracy, sensitivity, specificity and to reduce the False Positive Rate (FPR) and False Negative Rate (FNR) by feature selection. The proposed feature selection technique is comprised of two phases: feature grouping and feature selection. In the first phase, feature grouping uses the Pearson correlation techniques to identify the correlation among the features and group the features based on high-, medium- and low- level ranking. In the second phase, Triplet Feature Selection (TFS) method has been proposed to avoid collinearity among the features. In this, the features are selected based on the correlation differences in each subset when satisfying the race condition. Finally, select the features in the triplet group and apply LR classification technique to diagnose the disease. The proposed classifier achieved an accuracy (95.4%), FPR (1%), FNR (4%), sensitivity (97%) and specificity (96%) to detect the benign and malign ones. The effects of TFS feature selection with LR classifier were used and the performance of the proposed framework was compared with the existing feature selection methods and classifiers.



中文翻译:

使用三重特征选择算法的乳腺癌数据分析机器学习模型

机器学习技术可用于乳腺癌诊断的临床研究。研究人员调查了各种机器学习算法,例如支持向量机、朴素贝叶斯、逻辑回归 (LR)、随机森林、决策树和 K 最近邻来诊断疾病。从特征上及早发现乳腺癌细胞是必不可少的。特征选择是减少输入特征以提高模型性能的过程。本研究旨在通过特征选择提高准确性、灵敏度、特异性并降低假阳性率 (FPR) 和假阴性率 (FNR)。所提出的特征选择技术由两个阶段组成:特征分组和特征选择。在第一阶段,特征分组使用皮尔逊相关技术来识别特征之间的相关性,并根据高、中、低级别的排名对特征进行分组。在第二阶段,提出了三重特征选择(TFS)方法来避免特征之间的共线性。在这种情况下,在满足竞争条件时,根据每个子集中的相关差异来选择特征。最后,选择三元组中的特征,应用LR分类技术进行疾病诊断。所提出的分类器实现了准确率 (95.4%)、FPR (1%)、FNR (4%)、灵敏度 (97%) 和特异性 (96%) 来检测良性和恶性。

更新日期:2021-08-17
down
wechat
bug