Validation and generalizability of machine learning prediction models on attrition in longitudinal studies,International Journal of Behavioral Development

当前位置： X-MOL 学术 › International Journal of Behavioral Development › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Validation and generalizability of machine learning prediction models on attrition in longitudinal studies
International Journal of Behavioral Development ( IF 2.4 ) Pub Date : 2022-02-07 , DOI: 10.1177/01650254221075034
Kristin Jankowsky ₁ , Ulrich Schroeders ₁

Affiliation

Attrition in longitudinal studies is a major threat to the representativeness of the data and the generalizability of the findings. Typical approaches to address systematic nonresponse are either expensive and unsatisfactory (e.g., oversampling) or rely on the unrealistic assumption of data missing at random (e.g., multiple imputation). Thus, models that effectively predict who most likely drops out in subsequent occasions might offer the opportunity to take countermeasures (e.g., incentives). With the current study, we introduce a longitudinal model validation approach and examine whether attrition in two nationally representative longitudinal panel studies can be predicted accurately. We compare the performance of a basic logistic regression model with a more flexible, data-driven machine learning algorithm—gradient boosting machines. Our results show almost no difference in accuracies for both modeling approaches, which contradicts claims of similar studies on survey attrition. Prediction models could not be generalized across surveys and were less accurate when tested at a later survey wave. We discuss the implications of these findings for survey retention, the use of complex machine learning algorithms, and give some recommendations to deal with study attrition.

中文翻译：

机器学习预测模型在纵向研究中磨损的有效性和普遍性

纵向研究中的流失是对数据代表性和研究结果普遍性的主要威胁。解决系统性不响应的典型方法要么昂贵且不能令人满意（例如，过采样），要么依赖于随机丢失数据的不切实际假设（例如，多重插补）。因此，有效预测谁最有可能在随后的情况下退出的模型可能会提供采取对策（例如激励措施）的机会。通过目前的研究，我们引入了一种纵向模型验证方法，并检查是否可以准确预测两个具有全国代表性的纵向小组研究中的减员。我们将基本逻辑回归模型的性能与更灵活、数据驱动的机器学习算法——梯度提升机器进行比较。我们的结果显示，两种建模方法的准确性几乎没有差异，这与类似研究关于调查减员的说法相矛盾。预测模型无法在调查中推广，并且在随后的调查波中进行测试时准确性较低。我们讨论了这些发现对调查保留、复杂机器学习算法的使用的影响，并给出了一些处理学习减员的建议。

更新日期：2022-02-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文