当前位置: X-MOL 学术Anal. Methods Accid. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Contrasting case-wise deletion with multiple imputation and latent variable approaches to dealing with missing observations in count regression models
Analytic Methods in Accident Research ( IF 12.9 ) Pub Date : 2019-08-17 , DOI: 10.1016/j.amar.2019.100104
Amir Pooyan Afghari , Simon Washington , Carlo Prato , Md Mazharul Haque

Missing data can lead to biased and inefficient parameter estimates in statistical models, depending on the missing data mechanism. Count regression models are no exception, with missing data leading to incorrect inferences about the effects of explanatory variables. A convenient approach for dealing with missing data is to remove observations with incomplete records prior to the analysis – often referred to as case-wise deletion. Removing incomplete records, however, reduces the sample size, increases standard errors and, if data are not missing completely at random, produces biased parameter estimates. A more complex approach is multiple imputation, which provides an estimate of the modelling uncertainty created by the data ‘missing-ness’, as distinct from the natural variation in the data. However, multiple imputation produces biased parameter estimates if the probability of missing data is related to the observed data – or is endogenous. Latent variable modelling has recently been introduced as an alternative approach for dealing with missing data, but it comes at a high computational cost and complexity.

Despite fairly extensive methodological advancements in statistical literature, case-wise deletion is commonly employed to deal with missing data in statistical models of transport, while the multiple imputation and latent variable approaches remain relatively unexplored. More importantly, the performance of these approaches has not been tested across different types of data missing-ness. To address these gaps, this study aims to contrast case-wise deletion with multiple imputation and latent variable approaches in dealing with missing data in count regression models. We compare the performance of these three approaches using crash count models estimated against empirical data obtained from state controlled roads in Queensland, Australia. A quasi-experimental evaluation of data missing-ness is then conducted by extracting three data subsets from the original dataset, each with a unique missing data mechanism (with terminology adopted from the statistical literature): missing completely at random, missing at random, and missing not at random. The three approaches are then applied to each data subset and the results are compared in terms of bias, precision of parameter estimates, and goodness-of-fit. The findings indicate that multiple imputation is the most effective approach when data are missing either completely at random or at random, whereas the latent variable approach is more effective when data are missing not at random. However, the effectiveness of the latent variable approach is dependent on the availability of suitable variables as instruments in the data.



中文翻译:

案例归纳法与多重归因和潜在变量方法的对比,用于处理计数回归模型中的缺失观测值

数据丢失可能导致统计模型中的参数估计有偏差且效率低下,具体取决于数据丢失机制。计数回归模型也不例外,缺少数据会导致对解释变量影响的错误推论。处理缺失数据的一种便捷方法是在分析之前删除记录不完整的观察结果-通常称为按情况删除。但是,删除不完整的记录会减小样本数量,增加标准误差,并且如果数据不是随机完全丢失,则会产生有偏差的参数估计值。更为复杂的方法是多重插补,它提供了由数据“缺失”产生的建模不确定性的估计,这与数据的自然变化不同。然而,如果丢失数据的概率与观察到的数据有关或是内生的,则多次插补会产生有偏差的参数估计。最近引入了潜在变量建模作为处理丢失数据的替代方法,但是它具有很高的计算成本和复杂性。

尽管统计文献在方法学方面取得了相当大的进步,但是按情况删除通常用于处理运输统计模型中的缺失数据,而相对应采用多种归因和潜在变量方法。更重要的是,这些方法的性能尚未在不同类型的数据缺失中进行过测试。为了解决这些差距,本研究旨在对比在计数回归模型中处理缺失数据时,采用多重归因和潜在变量方法进行逐案删除。我们使用碰撞计数模型对这三种方法的性能进行了比较,这些模型是根据从澳大利亚昆士兰州政府控制的道路获得的经验数据估算得出的。然后,通过从原始数据集中提取三个数据子集来进行数据缺失性的准实验评估,每个子集都有一个独特的缺失数据机制(采用了统计文献中的术语):完全随机缺失,随机缺失以及失踪不是随机的。然后将这三种方法应用于每个数据子集,并根据偏差,参数估计的精度和拟合优度比较结果。研究结果表明,当数据完全随机或随机丢失时,多重插补是最有效的方法,而当数据随机丢失时,潜变量方法更有效。但是,潜在变量方法的有效性取决于适当变量作为数据工具的可用性。

更新日期:2019-08-17
down
wechat
bug