Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study,Biometrical Journal

当前位置： X-MOL 学术 › Biom. J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Developing risk models for multicenter data using standard logistic regression produced suboptimal predictions: A simulation study
Biometrical Journal ( IF 1.3 ) Pub Date : 2020-01-20 , DOI: 10.1002/bimj.201900075
Nora Falconieri ₁ , Ben Van Calster _{1,

2} , Dirk Timmerman _{1,

3} , Laure Wynants _{1,

4}

Affiliation

Abstract Although multicenter data are common, many prediction model studies ignore this during model development. The objective of this study is to evaluate the predictive performance of regression methods for developing clinical risk prediction models using multicenter data, and provide guidelines for practice. We compared the predictive performance of standard logistic regression, generalized estimating equations, random intercept logistic regression, and fixed effects logistic regression. First, we presented a case study on the diagnosis of ovarian cancer. Subsequently, a simulation study investigated the performance of the different models as a function of the amount of clustering, development sample size, distribution of center‐specific intercepts, the presence of a center‐predictor interaction, and the presence of a dependency between center effects and predictors. The results showed that when sample sizes were sufficiently large, conditional models yielded calibrated predictions, whereas marginal models yielded miscalibrated predictions. Small sample sizes led to overfitting and unreliable predictions. This miscalibration was worse with more heavily clustered data. Calibration of random intercept logistic regression was better than that of standard logistic regression even when center‐specific intercepts were not normally distributed, a center‐predictor interaction was present, center effects and predictors were dependent, or when the model was applied in a new center. Therefore, to make reliable predictions in a specific center, we recommend random intercept logistic regression.

中文翻译：

使用标准逻辑回归为多中心数据开发风险模型产生次优预测：模拟研究

摘要尽管多中心数据很常见，但许多预测模型研究在模型开发过程中忽略了这一点。本研究的目的是评估使用多中心数据开发临床风险预测模型的回归方法的预测性能，并为实践提供指导。我们比较了标准逻辑回归、广义估计方程、随机截距逻辑回归和固定效应逻辑回归的预测性能。首先，我们介绍了一个关于卵巢癌诊断的案例研究。随后，一项模拟研究调查了不同模型的性能与聚类数量、开发样本大小、中心特定截距的分布、中心-预测因子相互作用的存在、以及中心效应和预测变量之间存在依赖性。结果表明，当样本量足够大时，条件模型产生校准预测，而边际模型产生错误校准预测。小样本量导致过拟合和不可靠的预测。对于更密集的数据，这种错误校准会更糟。随机截距逻辑回归的校准优于标准逻辑回归的校准，即使在特定于中心的截距不呈正态分布、存在中心-预测变量相互作用、中心效应和预测变量是相关的，或者当该模型应用于新中心时. 因此，为了在特定中心做出可靠的预测，我们推荐随机截距逻辑回归。结果表明，当样本量足够大时，条件模型产生校准预测，而边际模型产生错误校准预测。小样本量导致过拟合和不可靠的预测。对于更密集的数据，这种错误校准会更糟。随机截距逻辑回归的校准优于标准逻辑回归的校准，即使在特定于中心的截距不呈正态分布、存在中心-预测变量相互作用、中心效应和预测变量是相关的，或者当该模型应用于新中心时. 因此，为了在特定中心做出可靠的预测，我们推荐随机截距逻辑回归。结果表明，当样本量足够大时，条件模型产生校准预测，而边际模型产生错误校准预测。小样本量导致过拟合和不可靠的预测。对于更密集的数据，这种错误校准更严重。随机截距逻辑回归的校准优于标准逻辑回归的校准，即使在特定于中心的截距不呈正态分布、存在中心-预测变量相互作用、中心效应和预测变量是相关的，或者当该模型应用于新中心时. 因此，为了在特定中心做出可靠的预测，我们推荐随机截距逻辑回归。而边缘模型产生了错误校准的预测。小样本量导致过拟合和不可靠的预测。对于更密集的数据，这种错误校准会更糟。随机截距逻辑回归的校准优于标准逻辑回归的校准，即使在特定于中心的截距不呈正态分布、存在中心-预测变量相互作用、中心效应和预测变量是相关的，或者当该模型应用于新中心时. 因此，为了在特定中心做出可靠的预测，我们推荐随机截距逻辑回归。而边缘模型产生了错误校准的预测。小样本量导致过拟合和不可靠的预测。对于更密集的数据，这种错误校准更严重。随机截距逻辑回归的校准优于标准逻辑回归的校准，即使在特定于中心的截距不呈正态分布、存在中心-预测变量相互作用、中心效应和预测变量是相关的，或者当该模型应用于新中心时. 因此，为了在特定中心做出可靠的预测，我们推荐随机截距逻辑回归。随机截距逻辑回归的校准优于标准逻辑回归的校准，即使在特定于中心的截距不呈正态分布、存在中心-预测变量相互作用、中心效应和预测变量相互依赖，或者当模型应用于新中心时. 因此，为了在特定中心做出可靠的预测，我们推荐随机截距逻辑回归。随机截距逻辑回归的校准优于标准逻辑回归的校准，即使在特定于中心的截距不呈正态分布、存在中心-预测变量相互作用、中心效应和预测变量是相关的，或者当该模型应用于新中心时. 因此，为了在特定中心做出可靠的预测，我们推荐随机截距逻辑回归。

更新日期：2020-01-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11