当前位置: X-MOL 学术Stat. Sin. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Network-regularized high-dimensional Cox regression for analysis of genomic data
Statistica Sinica ( IF 1.5 ) Pub Date : 2014-01-01 , DOI: 10.5705/ss.2012.317
Hokeun Sun 1 , Wei Lin 2 , Rui Feng 2 , Hongzhe Li 2
Affiliation  

We consider estimation and variable selection in high-dimensional Cox regression when a prior knowledge of the relationships among the covariates, described by a network or graph, is available. A limitation of the existing methodology for survival analysis with high-dimensional genomic data is that a wealth of structural information about many biological processes, such as regulatory networks and pathways, has often been ignored. In order to incorporate such prior network information into the analysis of genomic data, we propose a network-based regularization method for high-dimensional Cox regression; it uses an ℓ1-penalty to induce sparsity of the regression coefficients and a quadratic Laplacian penalty to encourage smoothness between the coefficients of neighboring variables on a given network. The proposed method is implemented by an efficient coordinate descent algorithm. In the setting where the dimensionality p can grow exponentially fast with the sample size n, we establish model selection consistency and estimation bounds for the proposed estimators. The theoretical results provide insights into the gain from taking into account the network structural information. Extensive simulation studies indicate that our method outperforms Lasso and elastic net in terms of variable selection accuracy and stability. We apply our method to a breast cancer gene expression study and identify several biologically plausible subnetworks and pathways that are associated with breast cancer distant metastasis.

中文翻译:


用于分析基因组数据的网络正则化高维 Cox 回归



当由网络或图形描述的协变量之间的关系的先验知识可用时,我们考虑高维 Cox 回归中的估计和变量选择。现有高维基因组数据生存分析方法的局限性在于,许多生物过程的大量结构信息(例如调控网络和途径)经常被忽略。为了将这些先验网络信息纳入基因组数据的分析中,我们提出了一种基于网络的高维Cox回归正则化方法;它使用ℓ1-惩罚来诱导回归系数的稀疏性,并使用二次拉普拉斯惩罚来鼓励给定网络上相邻变量的系数之间的平滑性。所提出的方法是通过高效的坐标下降算法来实现的。在维度 p 可以随样本大小 n 呈指数快速增长的情况下,我们为所提出的估计量建立模型选择一致性和估计界限。理论结果提供了对考虑网络结构信息的增益的见解。大量的模拟研究表明,我们的方法在变量选择精度和稳定性方面优于 Lasso 和弹性网络。我们将我们的方法应用于乳腺癌基因表达研究,并确定了与乳腺癌远处转移相关的几个生物学上合理的子网络和途径。
更新日期:2014-01-01
down
wechat
bug