当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fuzzy measure with regularization for gene selection and cancer prediction
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2021-04-20 , DOI: 10.1007/s13042-021-01319-3
JinFeng Wang , ZhenYu He , ShuaiHui Huang , Hao Chen , WenZhong Wang , Farhad Pourpanah

Dealing with high-dimensional gene expression data is a challenging issue, and it is crucial to select multiple informative subsets of genes for cancer classification. In this regard, many statistical and machine learning methods with regulations have been developed. However, these methods neglected the epistasis, i.e., some genes may cover or affect other genes. In this article, we propose a fuzzy measure with regularization, which adopts L1 and L1/2 norms for sparse solutions, known as FMR, to describe the interaction between genes. Regularization with L1 and L1/2 can obtain a series of sparse solutions which help solving fuzzy measure quicker than traditional methods, such as Genetic Algorithm. FMR obtains a subset of genes corresponding to the fewest nonzero fuzzy measure values, and consequently, selects the important gene(s) according to the frequency of appearance in the selected gene subsets. Besides, three base classifiers, including SVM, KNN and DBN, are employed as underlying models to verify the effectiveness of the selected subset(s) of genes. Experimental results indicate that the selected genes by FMR are consistent with several clinical studies. In addition, it can produce comparable results in terms of accuracy as compared with other methods reported in the literature. The codes used in this article are freely available at: https://github.com/wangphoenix/ICMLC.



中文翻译:

正则化的模糊测量用于基因选择和癌症预测

处理高维基因表达数据是一个具有挑战性的问题,为癌症分类选择基因的多个信息子集至关重要。在这方面,已经开发了许多具有法规的统计和机器学习方法。但是,这些方法忽略了上位性,即某些基因可能覆盖或影响其他基因。在本文中,我们提出了一种带正则化的模糊度量,该度量针对稀疏解采用L 1和L 1/2范数(称为FMR)来描述基因之间的相互作用。用L 1和L 1/2进行正则化与传统方法(例如遗传算法)相比,可以获取一系列稀疏解,从而有助于更快地解决模糊测度。FMR获取与最少的非零模糊测量值相对应的基因子集,因此,根据所选基因子集中出现的频率选择重要的基因。此外,采用了三个基本分类器,包括SVM,KNN和DBN作为基础模型,以验证所选基因子集的有效性。实验结果表明,通过FMR选择的基因与多项临床研究一致。此外,与文献中报道的其他方法相比,它可以在准确性方面产生可比的结果。本文中使用的代码可从以下网址免费获得:https://github.com/wangphoenix/ICMLC。

更新日期:2021-04-20
down
wechat
bug