当前位置: X-MOL 学术Neural Comput. & Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data
Neural Computing and Applications ( IF 6 ) Pub Date : 2020-06-14 , DOI: 10.1007/s00521-020-05101-4
Uzma , Feras Al-Obeidat , Abdallah Tubaishat , Babar Shah , Zahid Halim

Cancer is a severe condition of uncontrolled cell division that results in a tumor formation that spreads to other tissues of the body. Therefore, the development of new medication and treatment methods for this is in demand. Classification of microarray data plays a vital role in handling such situations. The relevant gene selection is an important step for the classification of microarray data. This work presents gene encoder, an unsupervised two-stage feature selection technique for the cancer samples’ classification. The first stage aggregates three filter methods, namely principal component analysis, correlation, and spectral-based feature selection techniques. Next, the genetic algorithm is used, which evaluates the chromosome utilizing the autoencoder-based clustering. The resultant feature subset is used for the classification task. Three classifiers, namely support vector machine, k-nearest neighbors, and random forest, are used in this work to avoid the dependency on any one classifier. Six benchmark gene expression datasets are used for the performance evaluation, and a comparison is made with four state-of-the-art related algorithms. Three sets of experiments are carried out to evaluate the proposed method. These experiments are for the evaluation of the selected features based on sample-based clustering, adjusting optimal parameters, and for selecting better performing classifier. The comparison is based on accuracy, recall, false positive rate, precision, F-measure, and entropy. The obtained results suggest better performance of the current proposal.



中文翻译:

基因编码器:通过无监督的基于深度学习的聚类来处理大型基因表达数据的特征选择技术

癌症是细胞分裂不受控制的严重疾病,会导致肿瘤形成并扩散到身体的其他组织。因此,需要开发新的药物和治疗方法。微阵列数据的分类在处理此类情况中起着至关重要的作用。相关的基因选择是微阵列数据分类的重要步骤。这项工作提出了基因编码器,一种用于癌症样本分类的无监督两阶段特征选择技术。第一阶段汇总了三种滤波方法,即主成分分析,相关性和基于光谱的特征选择技术。接下来,使用遗传算法,该算法利用基于自动编码器的聚类评估染色体。生成的特征子集用于分类任务。在这项工作中,使用k近邻和随机森林来避免依赖于任何一个分类器。六个基准基因表达数据集用于性能评估,并与四种最新的相关算法进行了比较。进行了三组实验以评估该方法。这些实验用于基于样本的聚类评估所选特征,调整最佳参数以及选择性能更好的分类器。比较是基于准确性,召回率,误报率,准确性,F测度和熵。获得的结果表明当前建议的性能更好。

更新日期:2020-06-14
down
wechat
bug