当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Design matters in patient-level prediction: evaluation of a cohort vs. case-control design when developing predictive models in observational healthcare datasets
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-08-16 , DOI: 10.1186/s40537-021-00501-2
Jenna M. Reps 1 , Patrick B. Ryan 1 , Martijn J. Schuemie 1 , Peter R. Rijnbeek 2
Affiliation  

Background

The design used to create labelled data for training prediction models from observational healthcare databases (e.g., case-control and cohort) may impact the clinical usefulness. We aim to investigate hypothetical design issues and determine how the design impacts prediction model performance.

Aim

To empirically investigate differences between models developed using a case-control design and a cohort design.

Methods

Using a US claims database, we replicated two published prediction models (dementia and type 2 diabetes) which were developed using a case-control design, and trained models for the same prediction questions using cohort designs. We validated each model on data mimicking the point in time the models would be applied in clinical practice. We calculated the models’ discrimination and calibration-in-the-large performances.

Results

The dementia models obtained area under the receiver operating characteristics of 0.560 and 0.897 for the case-control and cohort designs respectively. The type 2 diabetes models obtained area under the receiver operating characteristics of 0.733 and 0.727 for the case-control and cohort designs respectively. The dementia and diabetes case-control models were both poorly calibrated, whereas the dementia cohort model achieved good calibration. We show that careful construction of a case-control design can lead to comparable discriminative performance as a cohort design, but case-control designs over-represent the outcome class leading to miscalibration.

Conclusions

Any case-control design can be converted to a cohort design. We recommend that researchers with observational data use the less subjective and generally better calibrated cohort design when extracting labelled data. However, if a carefully constructed case-control design is used, then the model must be prospectively validated using a cohort design for fair evaluation and be recalibrated.



中文翻译:

患者水平预测中的设计问题:在观察性医疗数据集中开发预测模型时,评估队列与病例对照设计

背景

用于从观察性医疗保健数据库(例如病例对照和队列)中创建用于训练预测模型的标记数据的设计可能会影响临床实用性。我们旨在调查假设的设计问题并确定设计如何影响预测模型性能。

目的

实证研究使用病例对照设计和队列设计开发的模型之间的差异。

方法

使用美国索赔数据库,我们复制了两个已发布的预测模型(痴呆症和 2 型糖尿病),这些模型是使用病例对照设计开发的,并使用队列设计针对相同的预测问题训练模型。我们在模拟模型将在临床实践中应用的时间点的数据上验证了每个模型。我们计算了模型的辨别力和大规模校准性能。

结果

对于病例对照和队列设计,痴呆模型获得的接收器操作特征面积分别为 0.560 和 0.897。对于病例对照和队列设计,2 型糖尿病模型获得的受试者操作特征下的面积分别为 0.733 和 0.727。痴呆症和糖尿病病例对照模型的校准都很差,而痴呆症队列模型实现了良好的校准。我们表明,病例对照设计的仔细构建可以导致与队列设计相当的判别性能,但病例对照设计过度代表导致错误校准的结果类别。

结论

任何病例对照设计都可以转换为队列设计。我们建议拥有观察数据的研究人员在提取标记数据时使用主观性较低且通常校准得更好的队列设计。但是,如果使用精心构建的病例对照设计,则必须使用队列设计对模型进行前瞻性验证以进行公平评估并重新校准。

更新日期:2021-08-19
down
wechat
bug