当前位置: X-MOL 学术J. Med. Internet Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Relative Performance of Machine Learning and Linear Regression in Predicting Quality of Life and Academic Performance of School Children in Norway: Data Analysis of a Quasi-Experimental Study
Journal of Medical Internet Research ( IF 5.8 ) Pub Date : 2021-07-16 , DOI: 10.2196/22021
Robert Froud 1, 2 , Solveig Hakestad Hansen 1 , Hans Kristian Ruud 1 , Jonathan Foss 3 , Leila Ferguson 1 , Per Morten Fredriksen 1
Affiliation  

Background: Machine learning techniques are increasingly being applied in health research. It is not clear how useful these approaches are for modeling continuous outcomes. Child quality of life is associated with parental socioeconomic status and physical activity and may be associated with aerobic fitness and strength. It is unclear whether diet or academic performance is associated with quality of life. Objective: The purpose of this study was to compare the predictive performance of machine learning techniques with that of linear regression in examining the extent to which continuous outcomes (physical activity, aerobic fitness, muscular strength, diet, and parental education) are predictive of academic performance and quality of life and whether academic performance and quality of life are associated. Methods: We modeled data from children attending 9 schools in a quasi-experimental study. We split data randomly into training and validation sets. Curvilinear, nonlinear, and heteroscedastic variables were simulated to examine the performance of machine learning techniques compared to that of linear models, with and without imputation. Results: We included data for 1711 children. Regression models explained 24% of academic performance variance in the real complete-case validation set, and up to 15% in quality of life. While machine learning techniques explained high proportions of variance in training sets, in validation, machine learning techniques explained approximately 0% of academic performance and 3% to 8% of quality of life. With imputation, machine learning techniques improved to 15% for academic performance. Machine learning outperformed regression for simulated nonlinear and heteroscedastic variables. The best predictors of academic performance in adjusted models were the child’s mother having a master-level education (P<.001; β=1.98, 95% CI 0.25 to 3.71), increased television and computer use (P=.03; β=1.19, 95% CI 0.25 to 3.71), and dichotomized self-reported exercise (P=.001; β=2.47, 95% CI 1.08 to 3.87). For quality of life, self-reported exercise (P<.001; β=1.09, 95% CI 0.53 to 1.66) and increased television and computer use (P=.002; β=−0.95, 95% CI −1.55 to −0.36) were the best predictors. Adjusted academic performance was associated with quality of life (P=.02; β=0.12, 95% CI 0.02 to 0.22). Conclusions: Linear regression was less prone to overfitting and outperformed commonly used machine learning techniques. Imputation improved the performance of machine learning, but not sufficiently to outperform regression. Machine learning techniques outperformed linear regression for modeling nonlinear and heteroscedastic relationships and may be of use in such cases. Regression with splines performed almost as well in nonlinear modeling. Lifestyle variables, including physical exercise, television and computer use, and parental education are predictive of academic performance or quality of life. Academic performance is associated with quality of life after adjusting for lifestyle variables and may offer another promising intervention target to improve quality of life in children.

This is the abstract only. Read the full article on the JMIR site. JMIR is the leading open access journal for eHealth and healthcare in the Internet age.


中文翻译:

机器学习和线性回归在预测挪威学童生活质量和学业成绩方面的相对表现:准实验研究的数据分析

背景:机器学习技术越来越多地应用于健康研究。目前尚不清楚这些方法对于模拟连续结果的有用性。儿童的生活质量与父母的社会经济地位和身体活动有关,并且可能与有氧健康和力量有关。目前尚不清楚饮食或学业成绩是否与生活质量有关。目标:本研究的目的是比较机器学习技术与线性回归的预测性能,以检查连续结果(身体活动、有氧健身、肌肉力量、饮食和父母教育)对学业成绩的预测程度。表现和生活质量,以及学业成绩和生活质量是否相关。方法:我们在一项准实验研究中对来自 9 所学校的儿童的数据进行了建模。我们将数据随机分成训练集和验证集。模拟了曲线、非线性和异方差变量,以检查机器学习技术与线性模型的性能相比,有或没有插补。结果:我们纳入了 1711 名儿童的数据。回归模型解释了真实完整案例验证集中 24% 的学业成绩差异,以及高达 15% 的生活质量差异。虽然机器学习技术解释了训练集的高比例差异,但在验证中,机器学习技术解释了大约 0% 的学业成绩和 3% 到 8% 的生活质量。通过估算,机器学习技术的学业成绩提高了 15%。机器学习在模拟非线性和异方差变量方面优于回归。在调整后的模型中,学习成绩的最佳预测因子是孩子的母亲接受过硕士教育(P<.001;β=1.98,95% CI 0.25 至 3.71)、增加的电视和计算机使用(P=.03;β= 1.19,95% CI 0.25 至 3.71),以及二分的自我报告运动(P=.001;β=2.47,95% CI 1.08 至 3.87)。对于生活质量,自我报告的运动(P<.001;β=1.09,95% CI 0.53 至 1.66)和电视和电脑使用增加(P=.002;β=-0.95,95% CI -1.55 至 - 0.36) 是最好的预测因子。调整后的学业成绩与生活质量相关(P=.02;β=0.12,95% CI 0.02 至 0.22)。结论:线性回归不太容易过度拟合,并且优于常用的机器学习技术。插补提高了机器学习的性能,但不足以胜过回归。机器学习技术在建模非线性和异方差关系方面优于线性回归,并且可能在这种情况下有用。样条回归在非线性建模中的表现几乎一样。生活方式变量,包括体育锻炼、电视和计算机使用以及父母教育,可以预测学业成绩或生活质量。在调整生活方式变量后,学业成绩与生活质量相关,并且可能提供另一个有希望的干预目标,以改善儿童的生活质量。机器学习技术在建模非线性和异方差关系方面优于线性回归,并且可能在这种情况下有用。样条回归在非线性建模中的表现几乎一样。生活方式变量,包括体育锻炼、电视和计算机使用以及父母教育,可以预测学业成绩或生活质量。在调整生活方式变量后,学业成绩与生活质量相关,并且可能提供另一个有希望的干预目标,以改善儿童的生活质量。机器学习技术在建模非线性和异方差关系方面优于线性回归,并且可能在这种情况下有用。样条回归在非线性建模中的表现几乎一样。生活方式变量,包括体育锻炼、电视和计算机使用以及父母教育,可以预测学业成绩或生活质量。在调整生活方式变量后,学业成绩与生活质量相关,并且可能提供另一个有希望的干预目标,以改善儿童的生活质量。

这只是摘要。阅读 JMIR 网站上的完整文章。JMIR 是互联网时代电子健康和医疗保健领域领先的开放获取期刊。
更新日期:2021-07-16
down
wechat
bug