当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comprehensive relative importance analysis and its applications to high dimensional gene expression data analysis
Knowledge-Based Systems ( IF 7.2 ) Pub Date : 2020-06-08 , DOI: 10.1016/j.knosys.2020.106120
Zixin Shen , Argon Chen

Identification of important genes is challenging not only because of its high dimensional nature, but also because the expressions of genes from the same pathway are often highly correlated. A large number of feature selection methods have been proposed to select a subset of genes for interpretation and prediction of certain phenotypes. Among them, the L1 penalization-based methods, such as lasso, adaptive lasso and elastic net, gain most attentions. However, the L1 penalty employed by these methods is known to have difficulties in selection of a group of highly correlated features. The issue of identifying important highly correlated features, on the other hand, is well studied in the multiple regression analysis with a sufficient sample size. In particular, relative weight analysis is known effective in measuring the relative importance of correlated features. But the relative weight analysis suffers from the postulation of a full-column-rank feature matrix and is infeasible for high dimensional problems. In this research, a comprehensive relative importance analysis is proposed and proven valid without sample size and matrix rank restraints. Simulation and real cases are used to show the effectiveness of the proposed method in selecting relevant features especially for the high dimensional data.



中文翻译:

综合相对重要性分析及其在高维基因表达数据分析中的应用

重要基因的鉴定具有挑战性,这不仅是因为其高维特征,而且还因为来自同一途径的基因表达通常高度相关。已经提出了许多特征选择方法来选择用于解释和预测某些表型的基因子集。其中,大号1个基于罚分的方法(例如套索,自适应套索和弹性网)引起了最多的关注。然而大号1个已知这些方法所采用的惩罚难以选择一组高度相关的特征。另一方面,在具有足够样本量的多元回归分析中,对识别重要的高度相关特征的问题进行了深入研究。特别地,已知相对重量分析在测量相关特征的相对重要性方面是有效的。但是相对权重分析受制于全列秩特征矩阵的假设,并且对于高维问题不可行。在这项研究中,提出了一项全面的相对重要性分析,并证明了该分析的有效性,而没有样本量和矩阵等级限制。仿真和实际案例证明了该方法在选择相关特征(特别是针对高维数据)方面的有效性。

更新日期:2020-06-08
down
wechat
bug