当前位置: X-MOL 学术Stat. Interface › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The more data, the better? Demystifying deletion-based methods in linear regression with missing data
Statistics and Its Interface ( IF 0.3 ) Pub Date : 2022-03-04 , DOI: 10.4310/21-sii717
Tianchen Xu 1 , Kun Chen 2 , Gen Li 3
Affiliation  

We compare two deletion-based methods for dealing with the problem of missing observations in linear regression analysis. One is the complete-case analysis (CC, or listwise deletion) that discards all incomplete observations and only uses common samples for ordinary least-squares estimation. The other is the available-case analysis (AC, or pairwise deletion) that utilizes all available data to estimate the covariance matrices and applies these matrices to construct the normal equation. We show that the estimates from both methods are asymptotically unbiased under missing completely at random (MCAR) and further compare their asymptotic variances in some typical situations. Surprisingly, using more data (i.e., AC) does not necessarily lead to better asymptotic efficiency in many scenarios. Missing patterns, covariance structure and true regression coefficient values all play a role in determining which is better. We further conduct simulation studies to corroborate the findings and demystify what has been missed or misinterpreted in the literature. Some detailed proofs and simulation results are available in the online supplemental materials.

中文翻译:

数据越多越好?揭开缺失数据线性回归中基于删除的方法的神秘面纱

我们比较了两种基于删除的方法来处理线性回归分析中缺失观测值的问题。一种是完整案例分析(CC,或列表删除),它丢弃所有不完整的观察结果,仅使用普通样本进行普通最小二乘估计。另一种是可用案例分析(AC,或成对删除),它利用所有可用数据来估计协方差矩阵,并应用这些矩阵来构建正规方程。我们证明了两种方法的估计在完全随机缺失 (MCAR) 下是渐近无偏的,并进一步比较了它们在某些典型情况下的渐近方差。令人惊讶的是,在许多情况下,使用更多数据(即 AC)并不一定会带来更好的渐近效率。缺少模式,协方差结构和真实回归系数值都在确定哪个更好方面发挥作用。我们进一步进行模拟研究以证实研究结果并揭开文献中遗漏或误解的神秘面纱。在线补充材料中提供了一些详细的证明和模拟结果。
更新日期:2022-03-04
down
wechat
bug