Empirical evaluation of sub-cohort sampling designs for risk prediction modeling,Journal of Applied Statistics

当前位置： X-MOL 学术 › J. Appl. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Empirical evaluation of sub-cohort sampling designs for risk prediction modeling
Journal of Applied Statistics ( IF 1.2 ) Pub Date : 2020-12-21 , DOI: 10.1080/02664763.2020.1861225
Myeonggyun Lee ₁ , Anne Zeleniuch-Jacquotte _{1,

2} , Mengling Liu _{1,

2}

Affiliation

ABSTRACT

Sub-cohort sampling designs, such as nested case-control (NCC) and case-cohort (CC) studies, have been widely used to estimate biomarker-disease associations because of their cost effectiveness. These designs have been well studied and shown to maintain relatively high efficiency compared to full-cohort designs, but their performance of building risk prediction models has been less studied. Moreover, sub-cohort sampling designs often use matching (or stratifying) to further control for confounders or to reduce measurement error. Their predictive performance depends on both the design and matching procedures. Based on a dataset from the NYU Women's Health Study (NYUWHS), we performed Monte Carlo simulations to systematically evaluate risk prediction performance under NCC, CC, and full-cohort studies. Our simulations demonstrate that sub-cohort sampling designs can have predictive accuracy (i.e. discrimination and calibration) similar to that of the full-cohort design, but could be sensitive to the matching procedure used. Our results suggest that researchers can have the option of performing NCC and CC studies with huge potential benefits in cost and resources, but need to pay particular attention to the matching procedure when developing a risk prediction model in biomarker studies.

中文翻译：

用于风险预测建模的子队列抽样设计的实证评估

摘要

子队列抽样设计，例如嵌套病例对照 (NCC) 和病例队列 (CC) 研究，由于其成本效益，已被广泛用于估计生物标志物与疾病的关联。与全队列设计相比，这些设计已得到充分研究，并显示出保持相对较高的效率，但它们在构建风险预测模型方面的性能研究较少。此外，子队列抽样设计通常使用匹配（或分层）来进一步控制混杂因素或减少测量误差。它们的预测性能取决于设计和匹配程序。基于纽约大学妇女健康研究 (NYUWHS) 的数据集，我们进行了蒙特卡罗模拟，以系统地评估 NCC、CC 和全队列研究下的风险预测性能。我们的模拟表明，子队列抽样设计可以具有与全队列设计相似的预测精度（即区分和校准），但可能对所使用的匹配程序敏感。我们的结果表明，研究人员可以选择进行 NCC 和 CC 研究，在成本和资源方面具有巨大的潜在收益，但在开发生物标志物研究中的风险预测模型时需要特别注意匹配程序。

更新日期：2020-12-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11