当前位置: X-MOL 学术Psychological Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion.
Psychological Methods ( IF 7.6 ) Pub Date : 2022-02-03 , DOI: 10.1037/met0000478
Heather J Gunn 1 , Panteha Hayati Rezvan 2 , M Isabel Fernández 3 , W Scott Comulada 2
Affiliation  

Psychological researchers often use standard linear regression to identify relevant predictors of an outcome of interest, but challenges emerge with incomplete data and growing numbers of candidate predictors. Regularization methods like the LASSO can reduce the risk of overfitting, increase model interpretability, and improve prediction in future samples; however, handling missing data when using regularization-based variable selection methods is complicated. Using listwise deletion or an ad hoc imputation strategy to deal with missing data when using regularization methods can lead to loss of precision, substantial bias, and a reduction in predictive ability. In this tutorial, we describe three approaches for fitting a LASSO when using multiple imputation to handle missing data and illustrate how to implement these approaches in practice with an applied example. We discuss implications of each approach and describe additional research that would help solidify recommendations for best practices.

中文翻译:

如何将变量选择机器学习算法与多重插补数据一起应用:缺少讨论。

心理学研究人员经常使用标准线性回归来识别感兴趣结果的相关预测因子,但随着数据不完整和候选预测因子数量的不断增加,挑战随之而来。像 LASSO 这样的正则化方法可以降低过度拟合的风险,提高模型的可解释性,并改善对未来样本的预测;然而,使用基于正则化的变量选择方法时处理缺失数据很复杂。使用正则化方法时,使用列表删除或临时插补策略来处理缺失数据可能会导致精度损失、严重偏差和预测能力降低。在本教程中,我们描述了使用多重插补处理缺失数据时拟合 LASSO 的三种方法,并通过应用示例说明了如何在实践中实现这些方法。我们讨论每种方法的影响,并描述有助于巩固最佳实践建议的其他研究。
更新日期:2022-02-03
down
wechat
bug