当前位置: X-MOL 学术Math. Comput. Simul. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating Machine Learning methods for estimation in online surveys with superpopulation modeling
Mathematics and Computers in Simulation ( IF 4.4 ) Pub Date : 2021-08-01 , DOI: 10.1016/j.matcom.2020.03.005
Ramón Ferri-García , Luis Castro-Martín , María del Mar Rueda

Abstract Online surveys, despite their cost and effort advantages, are particularly prone to selection bias due to the differences between target population and potentially covered population (online population). This leads to the unreliability of estimates coming from online samples unless further adjustments are applied. Some techniques have arisen in the last years regarding this issue, among which superpopulation modeling can be useful in Big Data context where censuses are accessible. This technique uses the sample to train a model capturing the behavior of a target variable which is to be estimated, and applies it to the nonsampled individuals to obtain population-level estimates. The modeling step has been usually done with linear regression or LASSO models, but machine learning (ML) algorithms have been pointed out as promising alternatives. In this study we examine the use of these algorithms in the online survey context, in order to evaluate and compare their performance and adequacy to the problem. A simulation study shows that ML algorithms can effectively volunteering bias to a greater extent than traditional methods in several scenarios.

中文翻译:

使用超级人口建模评估在线调查中用于估计的机器学习方法

摘要 在线调查尽管具有成本和精力优势,但由于目标人群和潜在覆盖人群(在线人群)之间的差异,特别容易出现选择偏差。这导致来自在线样本的估计不可靠,除非应用进一步的调整。过去几年出现了一些关于这个问题的技术,其中超级人口建模在可以访问人口普查的大数据环境中很有用。这种技术使用样本来训练一个模型来捕捉要估计的目标变量的行为,并将其应用于非抽样个体以获得总体水平的估计。建模步骤通常使用线性回归或 LASSO 模型完成,但机器学习 (ML) 算法已被指出是有前途的替代方案。在这项研究中,我们检查了这些算法在在线调查环境中的使用,以评估和比较它们的性能和对问题的充分性。一项模拟研究表明,ML 算法可以在多种情况下比传统方法在更大程度上有效地自愿提供偏见。
更新日期:2021-08-01
down
wechat
bug