当前位置: X-MOL 学术Lifetime Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Conditional screening for ultrahigh-dimensional survival data in case-cohort studies
Lifetime Data Analysis ( IF 1.2 ) Pub Date : 2021-08-20 , DOI: 10.1007/s10985-021-09531-7
Jing Zhang 1 , Haibo Zhou 2 , Yanyan Liu 3 , Jianwen Cai 2
Affiliation  

The case-cohort design has been widely used to reduce the cost of covariate measurements in large cohort studies. In many such studies, the number of covariates is very large, and the goal of the research is to identify active covariates which have great influence on response. Since the introduction of sure independence screening, screening procedures have achieved great success in terms of effectively reducing the dimensionality and identifying active covariates. However, commonly used screening methods are based on marginal correlation or its variants, they may fail to identify hidden active variables which are jointly important but are weakly correlated with the response. Moreover, these screening methods are mainly proposed for data under the simple random sampling and can not be directly applied to case-cohort data. In this paper, we consider the ultrahigh-dimensional survival data under the case-cohort design, and propose a conditional screening method by incorporating some important prior known information of active variables. This method can effectively detect hidden active variables. Furthermore, it possesses the sure screening property under some mild regularity conditions and does not require any complicated numerical optimization. We evaluate the finite sample performance of the proposed method via extensive simulation studies and further illustrate the new approach through a real data set from patients with breast cancer.



中文翻译:


病例队列研究中超高维生存数据的条件筛选



病例队列设计已被广泛用于降低大型队列研究中协变量测量的成本。在许多此类研究中,协变量的数量非常大,研究的目标是识别对反应有很大影响的活跃协变量。自从引入确定独立性筛选以来,筛选程序在有效降维和识别活跃协变量方面取得了巨大成功。然而,常用的筛选方法基于边际相关或其变体,它们可能无法识别共同重要但与响应相关性较弱的隐藏主动变量。而且,这些筛选方法主要是针对简单随机抽样下的数据提出的,不能直接应用于病例队列数据。在本文中,我们考虑病例队列设计下的超高维生存数据,并通过结合主动变量的一些重要的先验已知信息,提出了一种条件筛选方法。该方法可以有效地检测隐藏的活跃变量。此外,它在一些温和的规律性条件下具有确定的筛选性能,并且不需要任何复杂的数值优化。我们通过广泛的模拟研究评估了所提出方法的有限样本性能,并通过乳腺癌患者的真实数据集进一步说明了新方法。

更新日期:2021-08-23
down
wechat
bug