当前位置: X-MOL 学术Artif. Intell. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EKNN: Ensemble classifier incorporating connectivity and density into kNN with application to cancer diagnosis
Artificial Intelligence in Medicine ( IF 6.1 ) Pub Date : 2020-11-08 , DOI: 10.1016/j.artmed.2020.101985
Mohamed A Mahfouz 1 , Amin Shoukry 2 , Mohamed A Ismail 1
Affiliation  

In the microarray-based approach for automated cancer diagnosis, the application of the traditional k-nearest neighbors kNN algorithm suffers from several difficulties such as the large number of genes (high dimensionality of the feature space) with many irrelevant genes (noise) relative to the small number of available samples and the imbalance in the size of the samples of the target classes. This research provides an ensemble classifier based on decision models derived from kNN that is applicable to problems characterized by imbalanced small size datasets. The proposed classification method is an ensemble of the traditional kNN algorithm and four novel classification models derived from it. The proposed models exploit the increase in density and connectivity using K1-nearest neighbors table (KNN-table) created during the training phase. In the density model, an unseen sample u is classified as belonging to a class t if it achieves the highest increase in density when this sample is added to it i.e. the unseen sample can replace more neighbors in the KNN-table for samples of class t than other classes. In the other three connectivity models, the mean and standard deviation of the distribution of the average, minimum as well the maximum distance to the K neighbors of the members of each class are computed in the training phase. The class t to which u achieves the highest possibility of belongness to its distribution is chosen, i.e. the addition of u to the samples of this class produces the least change to the distribution of the corresponding decision model for class t. Combining the predicted results of the four individual models along with traditional kNN makes the decision space more discriminative. With the help of the KNN-table which can be updated online in the training phase, an improved performance has been achieved compared to the traditional kNN algorithm with slight increase in classification time. The proposed ensemble method achieves significant increase in accuracy compared to the accuracy achieved using any of its base classifiers on Kentridge, GDS3257, Notterman, Leukemia and CNS datasets. The method is also compared to several existing ensemble methods and state of the art techniques using different dimensionality reduction techniques on several standard datasets. The results prove clear superiority of EKNN over several individual and ensemble classifiers regardless of the choice of the gene selection strategy.



中文翻译:

EKNN:集成分类器将连通性和密度结合到 kNN 中,应用于癌症诊断

在用于自动癌症诊断的基于微阵列的方法中,传统的k最近邻k NN 算法的应用存在一些困难,例如大量基因(特征空间的高维数)与许多不相关的基因(噪声)相对可用样本数量少,目标类样本大小不平衡。这项研究提供了基于来源于决策模型综合识别ķ NN是适用为特征的不平衡小型数据集的问题。所提出的分类方法是传统kNN 算法和由它派生的四种新的分类模型。所提出的模型使用在训练阶段创建的K 1 -最近邻表(KNN-表)来利用密度和连接性的增加。在密度模型中,如果一个未见样本u在添加该样本时密度增加最大,则该未见样本u被归类为属于类 t比其他班级。在其他三个连接模型中,在训练阶段计算每个类成员的 K 个邻居的平均、最小和最大距离分布的均值和标准差。T级到û选择属于其分布的最高可能性,即,将u 添加到此类的样本中会对类 t 的相应决策模型的分布产生最小的变化。将四个单独模型的预测结果与传统的k NN相结合,使决策空间更具辨别力。借助可以在训练阶段在线更新的 KNN 表,与传统的kNN 算法,分类时间略有增加。与在 Kentridge、GDS3257、Notterman、Leukemia 和 CNS 数据集上使用其任何基本分类器所达到的准确度相比,所提出的集成方法的准确度显着提高。该方法还与几种现有的集成方法和最先进的技术进行了比较,在几个标准数据集上使用不同的降维技术。结果证明 EKNN 明显优于几个个体和集合分类器,而不管基因选择策略的选择如何。

更新日期:2020-11-21
down
wechat
bug