当前位置: X-MOL 学术Stat. Interface › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sparsity-restricted estimation for the accelerated failure time model
Statistics and Its Interface ( IF 0.8 ) Pub Date : 2021-08-11 , DOI: 10.4310/21-sii669
Xiaoyu Zhang 1 , Yunpeng Zhou 1 , Jinfeng Xu 2 , Kam Chuen Yuen 1
Affiliation  

In many biomedical studies, such as high-throughput microarray or RNA-sequencing (RNA-seq) gene expression analyses, it is of practical interest to link gene expression profiles to censored survival phenotypes, for example, time to cancer recurrence or time to death. With the number of genes greatly exceeding the sample size and the nuances of survival data such as right censoring, regularized methods that combine the rank-based loss function and the penalty are often used to identify relevant prognostic biomarkers and yield parsimonious prediction models for event times. Existing penalization methods for survival data often use $\ell_1$ penalty to approximate the sparsity, yielding numerical convenience for its convexity. In practice, however, the $\ell_1$ approximation also leads to an inflated model size to achieve a desired cross-validated prediction error when compared to the ideal sparsity-restricted method. In this paper, we consider sparsity-restricted estimation in the accelerated failure time (AFT) model for censored survival data. An efficient and fast two-stage procedure that uses a convex regularized Gehan rank regression and a simple hard-thresholding estimation is proposed for its numerical implementation. The effectiveness of the proposed method is demonstrated by extensive simulation studies and real-data applications.

中文翻译:

加速失效时间模型的稀疏限制估计

在许多生物医学研究中,例如高通量微阵列或 RNA 测序 (RNA-seq) 基因表达分析,将基因表达谱与截尾生存表型联系起来具有实际意义,例如,癌症复发时间或死亡时间. 由于基因数量大大超过样本量和生存数据的细微差别,例如右删失,结合基于秩的损失函数和惩罚的正则化方法通常用于识别相关的预后生物标志物并产生事件时间的简约预测模型. 现有的生存数据惩罚方法通常使用 $\ell_1$ 惩罚来近似稀疏性,为其凸性提供数值便利。然而在实践中,与理想的稀疏限制方法相比,$\ell_1$ 近似还导致模型尺寸膨胀,以实现所需的交叉验证预测误差。在本文中,我们在加速失效时间 (AFT) 模型中考虑了删失生存数据的稀疏限制估计。提出了一种使用凸正则化 Gehan 秩回归和简单的硬阈值估计的高效且快速的两阶段程序,用于其数值实现。广泛的模拟研究和实际数据应用证明了所提出方法的有效性。提出了一种使用凸正则化 Gehan 秩回归和简单的硬阈值估计的高效且快速的两阶段程序,用于其数值实现。广泛的模拟研究和实际数据应用证明了所提出方法的有效性。提出了一种使用凸正则化 Gehan 秩回归和简单的硬阈值估计的高效且快速的两阶段程序,用于其数值实现。广泛的模拟研究和实际数据应用证明了所提出方法的有效性。
更新日期:2021-08-12
down
wechat
bug