当前位置: X-MOL 学术Mathematics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inference from Non-Probability Surveys with Statistical Matching and Propensity Score Adjustment Using Modern Prediction Techniques
Mathematics ( IF 2.3 ) Pub Date : 2020-06-01 , DOI: 10.3390/math8060879
Luis Castro-Martín , Maria del Mar Rueda , Ramón Ferri-García

Online surveys are increasingly common in social and health studies, as they provide fast and inexpensive results in comparison to traditional ones. However, these surveys often work with biased samples, as the data collection is often non-probabilistic because of the lack of internet coverage in certain population groups and the self-selection procedure that many online surveys rely on. Some procedures have been proposed to mitigate the bias, such as propensity score adjustment (PSA) and statistical matching. In PSA, propensity to participate in a nonprobability survey is estimated using a probability reference survey, and then used to obtain weighted estimates. In statistical matching, the nonprobability sample is used to train models to predict the values of the target variable, and the predictions of the models for the probability sample can be used to estimate population values. In this study, both methods are compared using three datasets to simulate pseudopopulations from which nonprobability and probability samples are drawn and used to estimate population parameters. In addition, the study compares the use of linear models and Machine Learning prediction algorithms in propensity estimation in PSA and predictive modeling in Statistical Matching. The results show that statistical matching outperforms PSA in terms of bias reduction and Root Mean Square Error (RMSE), and that simpler prediction models, such as linear and k-Nearest Neighbors, provide better outcomes than bagging algorithms.

中文翻译:

使用现代预测技术通过统计匹配和倾向得分调整从非概率调查中推断

在线调查在社会和健康研究中越来越普遍,因为与传统调查相比,它们提供了快速而廉价的结果。但是,由于某些人群缺乏互联网覆盖以及许多在线调查所依赖的自我选择程序,因此数据收集通常是非概率的,因此这些调查通常使用有偏差的样本。已经提出了一些减轻偏差的程序,例如倾向得分调整(PSA)和统计匹配。在PSA中,使用概率参考调查估计参与非概率调查的可能性,然后将其用于获取加权估计。在统计匹配中,非概率样本用于训练模型以预测目标变量的值,概率样本模型的预测可以用来估计总体值。在这项研究中,使用三种数据集对这两种方法进行了比较,以模拟伪种群,从伪种群中抽取非概率样本和概率样本,并将其用于估计总体参数。此外,该研究还比较了线性模型和机器学习预测算法在PSA倾向估计和统计匹配中预测模型中的使用。结果表明,在偏差减少和均方根误差(RMSE)方面,统计匹配的性能优于PSA,并且较简单的预测模型(如线性和k最近邻)提供的结果要优于装袋算法。使用三个数据集对这两种方法进行比较,以模拟伪种群,从中抽取非概率样本和概率样本并用于估计总体参数。此外,该研究还比较了线性模型和机器学习预测算法在PSA倾向估计和统计匹配中预测模型中的使用。结果表明,在偏差减少和均方根误差(RMSE)方面,统计匹配的性能优于PSA,并且较简单的预测模型(如线性和k最近邻)提供的结果要优于装袋算法。使用三个数据集对这两种方法进行比较,以模拟伪种群,从中抽取非概率样本和概率样本并用于估计总体参数。此外,该研究还比较了线性模型和机器学习预测算法在PSA倾向估计和统计匹配中预测模型中的使用。结果表明,在偏差减少和均方根误差(RMSE)方面,统计匹配的性能优于PSA,并且较简单的预测模型(如线性和k最近邻)提供的结果要优于装袋算法。该研究比较了线性模型和机器学习预测算法在PSA倾向估计中和统计匹配中预测模型中的使用。结果表明,在偏差减少和均方根误差(RMSE)方面,统计匹配的性能优于PSA,并且较简单的预测模型(如线性和k最近邻)提供的结果要优于装袋算法。该研究比较了线性模型和机器学习预测算法在PSA倾向估计中和统计匹配中预测模型中的使用。结果表明,在偏差减少和均方根误差(RMSE)方面,统计匹配的性能优于PSA,并且较简单的预测模型(如线性和k最近邻)提供的结果要优于装袋算法。
更新日期:2020-06-01
down
wechat
bug