当前位置: X-MOL 学术Biom. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparison of complex modeling strategies for prediction of a binary outcome based on a few, highly correlated predictors
Biometrical Journal ( IF 1.3 ) Pub Date : 2020-03-30 , DOI: 10.1002/bimj.201800243
Marco Chiabudini 1, 2 , Martin Schumacher 1 , Erika Graf 1
Affiliation  

Motivated by a clinical prediction problem, a simulation study was performed to compare different approaches for building risk prediction models. Robust prediction models for hospital survival in patients with acute heart failure were to be derived from three highly correlated blood parameters measured up to four times, with predictive ability having explicit priority over interpretability. Methods that relied only on the original predictors were compared with methods using an expanded predictor space including transformations and interactions. Predictors were simulated as transformations and combinations of multivariate normal variables which were fitted to the partly skewed and bimodally distributed original data in such a way that the simulated data mimicked the original covariate structure. Different penalized versions of logistic regression as well as random forests and generalized additive models were investigated using classical logistic regression as a benchmark. Their performance was assessed based on measures of predictive accuracy, model discrimination, and model calibration. Three different scenarios using different subsets of the original data with different numbers of observations and events per variable were investigated. In the investigated setting, where a risk prediction model should be based on a small set of highly correlated and interconnected predictors, Elastic Net and also Ridge logistic regression showed good performance compared to their competitors, while other methods did not lead to substantial improvements or even performed worse than standard logistic regression. Our work demonstrates how simulation studies that mimic relevant features of a specific data set can support the choice of a good modeling strategy.

中文翻译:

基于几个高度相关的预测因子预测二元结果的复杂建模策略的比较

受临床预测问题的启发,进行了一项模拟研究,以比较构建风险预测模型的不同方法。急性心力衰竭患者住院生存的稳健预测模型将来自三个高度相关的血液参数,最多测量四次,预测能力明显优先于可解释性。将仅依赖原始预测变量的方法与使用包括变换和交互作用的扩展预测变量空间的方法进行了比较。预测变量被模拟为多元正态变量的变换和组合,这些变量以模拟数据模仿原始协变量结构的方式拟合到部分偏斜和双峰分布的原始数据。使用经典逻辑回归作为基准,研究了逻辑回归的不同惩罚版本以及随机森林和广义可加模型。它们的性能是根据预测准确性、模型辨别力和模型校准的度量来评估的。研究了使用原始数据的不同子集的三种不同场景,每个变量具有不同数量的观察和事件。在所调查的环境中,风险预测模型应该基于一小组高度相关和相互关联的预测变量,与竞争对手相比,弹性网络和岭逻辑回归表现出良好的性能,而其他方法并没有带来实质性的改进,甚至表现比标准逻辑回归差。
更新日期:2020-03-30
down
wechat
bug