当前位置: X-MOL 学术Biom. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Construction and assessment of prediction rules for binary outcome in the presence of missing predictor data using multiple imputation and cross‐validation: Methodological approach and data‐based evaluation
Biometrical Journal ( IF 1.7 ) Pub Date : 2020-02-13 , DOI: 10.1002/bimj.201800289
Bart J A Mertens 1 , Erika Banzato 2 , Liesbeth C de Wreede 1
Affiliation  

Abstract We investigate calibration and assessment of predictive rules when missing values are present in the predictors. Our paper has two key objectives. The first is to investigate how the calibration of the prediction rule can be combined with use of multiple imputation to account for missing predictor observations. The second objective is to propose such methods that can be implemented with current multiple imputation software, while allowing for unbiased predictive assessment through validation on new observations for which outcome is not yet available. We commence with a review of the methodological foundations of multiple imputation as a model estimation approach as opposed to a purely algorithmic description. We specifically contrast application of multiple imputation for parameter (effect) estimation with predictive calibration. Based on this review, two approaches are formulated, of which the second utilizes application of the classical Rubin's rules for parameter estimation, while the first approach averages probabilities from models fitted on single imputations to directly approximate the predictive density for future observations. We present implementations using current software that allow for validation and estimation of performance measures by cross‐validation, as well as imputation of missing data in predictors on the future data where outcome is missing by definition. To simplify, we restrict discussion to binary outcome and logistic regression throughout. Method performance is verified through application on two real data sets. Accuracy (Brier score) and variance of predicted probabilities are investigated. Results show substantial reductions in variation of calibrated probabilities when using the first approach.

中文翻译:

使用多重插补和交叉验证在存在缺失预测数据的情况下构建和评估二元结果的预测规则:方法论和基于数据的评估

摘要 当预测变量中存在缺失值时,我们研究了预测规则的校准和评估。我们的论文有两个关键目标。第一个是研究如何将预测规则的校准与多重插补的使用相结合,以解决缺失的预测变量观察。第二个目标是提出这样的方法,这些方法可以用当前的多重插补软件实施,同时允许通过对结果尚不可用的新观察进行验证来进行无偏见的预测评估。我们首先回顾作为模型估计方法的多重插补的方法论基础,而不是纯粹的算法描述。我们特别将多重插补应用于参数(效果)估计与预测校准进行对比。基于此审查,制定了两种方法,其中第二种方法利用经典鲁宾规则进行参数估计,而第一种方法对拟合在单一插补上的模型的概率进行平均,以直接近似预测未来观测的预测密度。我们展示了使用当前软件的实现,这些软件允许通过交叉验证来验证和估计性能度量,以及将预测变量中的缺失数据插补到根据定义缺失结果的未来数据。为简化起见,我们始终将讨论限制在二元结果和逻辑回归上。通过在两个真实数据集上的应用来验证方法性能。研究了预测概率的准确性(Brier 分数)和方差。
更新日期:2020-02-13
down
wechat
bug