当前位置: X-MOL 学术Stat. Pap. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Variable selection in Propensity Score Adjustment to mitigate selection bias in online surveys
Statistical Papers ( IF 1.3 ) Pub Date : 2022-03-02 , DOI: 10.1007/s00362-022-01296-x
Ramón Ferri-García 1 , María del Mar Rueda 1
Affiliation  

The development of new survey data collection methods such as online surveys has been particularly advantageous for social studies in terms of reduced costs, immediacy and enhanced questionnaire possibilities. However, many such methods are strongly affected by selection bias, leading to unreliable estimates. Calibration and Propensity Score Adjustment (PSA) have been proposed as methods to remove selection bias in online nonprobability surveys. Calibration requires population totals to be known for the auxiliary variables used in the procedure, while PSA estimates the volunteering propensity of an individual using predictive modelling. The variables included in these models must be carefully selected in order to maximise the accuracy of the final estimates. This study presents an application, using synthetic and real data, of variable selection techniques developed for knowledge discovery in data to choose the best subset of variables for propensity estimation. We also compare the performance of PSA using different classification algorithms, after which calibration is applied. We also present an application of this methodology in a real-world situation, using it to obtain estimates of population parameters. The results obtained show that variable selection using appropriate methods can provide less biased and more efficient estimates than using all available covariates.



中文翻译:

倾向得分调整中的变量选择以减轻在线调查中的选择偏差

在线调查等新的调查数据收集方法的开发在降低成本、即时性和增强问卷调查的可能性方面对社会研究特别有利。然而,许多此类方法受到选择偏差的强烈影响,导致估计不可靠。校准和倾向得分调整 (PSA) 已被提议作为消除在线非概率调查中选择偏差的方法。校准需要知道程序中使用的辅助变量的总体总数,而 PSA 使用预测模型估计个人的志愿倾向。必须仔细选择这些模型中包含的变量,以最大限度地提高最终估计的准确性。本研究提出了一个应用程序,使用合成和真实数据,为数据中的知识发现而开发的变量选择技术,以选择用于倾向估计的最佳变量子集。我们还使用不同的分类算法比较了 PSA 的性能,然后应用了校准。我们还介绍了这种方法在现实世界中的应用,使用它来获得人口参数的估计。获得的结果表明,与使用所有可用的协变量相比,使用适当的方法选择变量可以提供更少的偏差和更有效的估计。我们还介绍了这种方法在现实世界中的应用,使用它来获得人口参数的估计。获得的结果表明,与使用所有可用的协变量相比,使用适当的方法选择变量可以提供更少的偏差和更有效的估计。我们还介绍了这种方法在现实世界中的应用,使用它来获得人口参数的估计。获得的结果表明,与使用所有可用的协变量相比,使用适当的方法选择变量可以提供更少的偏差和更有效的估计。

更新日期:2022-03-02
down
wechat
bug