Generalized Linear Mixed Models for Randomized Responses
Abstract
Abstract. Response bias (nonresponse and social desirability bias) is one of the main concerns when asking sensitive questions about behavior and attitudes. Self-reports on sensitive issues as in health research (e.g., drug and alcohol abuse), and social and behavioral sciences (e.g., attitudes against refugees, academic cheating) can be expected to be subject to considerable misreporting. To diminish misreporting on self-reports, indirect questioning techniques have been proposed such as the randomized response techniques. The randomized response techniques avoid a direct link between individual’s response and the sensitive question, thereby protecting the individual’s privacy. Next to the development of the innovative data collection methods, methodological advances have been made to enable a multivariate analysis to relate responses to sensitive questions to other variables. It is shown that the developments can be represented by a general response probability model (including all common designs) by extending it to a generalized linear model (GLM) or a generalized linear mixed model (GLMM). The general methodology is based on modifying common link functions to relate a linear predictor to the randomized response. This approach makes it possible to use existing software for GLMs and GLMMs to model randomized response data. The R-package GLMMRR makes the advanced methodology available to applied researchers. The extended models and software will seriously improve the application of the randomized response methodology. Three empirical examples are given to illustrate the methods.
References
2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01
(2015). Design and analysis of the randomized response technique. Journal of the American Statistical Association, 110, 1304–1319. https://doi.org/10.1080/01621459.2015.1050028
(2015). rr: Statistical methods for the randomized response, Comprehensive R Archive Network (CRAN). Retrieved from http://CRAN.R-project.org/package=rr
(2009). Do randomized-response designs eliminate response biases? An empirical study of non-compliance behavior. Journal of Applied Econometrics, 24, 377–392. https://doi.org/10.1002/jae.1052
(2007). Item randomized–response models for measuring noncompliance: Risk–return perceptions, social influences, and self-protective responses. Psychometrika, 72, 245–262. https://doi.org/10.1007/s11336-005-1495-y
(1971). Maintaining confidentiality of data in educational research: A systematic analysis. The American Psychologist, 26, 413–430. https://doi.org/10.1037/h0031502
(1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9–25. https://doi.org/10.2307/2290687
(2016). The multidimensional randomized response design: Estimating different aspects of the same sensitive behavior. Behavior Research Methods, 48, 390–399. https://doi.org/10.3758/s13428-015-0583-2
(2015). Quantifying under- and overreporting in surveys through a dual-questioning-technique design. Journal of Marketing Research, 52, 737–753. https://doi.org/10.1509/jmr.12.0336
(2010). Reducing social desirability bias through item randomized response: An application to measure underreported desires. Journal of Marketing Research, 47, 14–27. https://doi.org/10.1509/jmkr.47.1.14
(2005). Randomized item response theory models. Journal of Educational and Behavioral Statistics, 30, 1–24. https://doi.org/10.3102/10769986030002189
(2016).
(Bayesian randomized item response theory models for sensitive measurements . In W.J. van der LindenEd., Handbook of item response theory: Vol. 1. Models (pp. 4821–4837). Boca Raton, FL: Chapman & Hall/CRC.2013). Mixture randomized item-response modeling: A smoking behavior validation study. Statistics in Medicine, 32. https://doi.org/10.1002/sim.5859
(2014). Compensatory and non-compensatory multidimensional randomized item response models. British Journal of Mathematical and Statistical Psychology, 67, 133–152. https://doi.org/10.1111/bmsp.12012
(2016). GLMMRR: Generalized linear mixed modeling of RR data, Comprehensive R Archive Network (CRAN). Retrieved from https://cran.r-project.org/web/packages/GLMMRR
(2008). Using IRT to obtain individual information from randomized response data: An application using cheating data. Applied Psychological Measurement, 32, 595–610. https://doi.org/10.1177/0146621607312277
(2008). A mixed effects randomized item response model. Journal of Educational and Behavioral Statistics, 33, 389–415. https://doi.org/10.3102/1076998607306451
(1984). Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives. Journal of the Royal Statistical Society, Series B, 46, 149–192.
(1969). The unrelated question randomized response model: theoretical framework. Journal of the American Statistical Association, 64, 520–539. https://doi.org/10.2307/2283636
(2014). RRreg: Correlation and regression analyses for randomized response data. Comprehensive R Archive Network (CRAN). Retrieved from http://cran.r-project.org/package=RRreg
(2015). A strong validation of the crosswise model using experimentally-induced cheating behavior. Journal of Experimental Psychology, 62, 403–414. https://doi.org/10.1027/1618-3169/a000304
(2016). More is not always better: An experimental individual-level validation of the randomized response technique and the crosswise model, (Working paper No. 18). Retrieved from University of Bern Social Sciences http://ideas.repec.org/p/bss/wpaper/18.html
(1980). Goodness of fit tests for the multiple logistic regression model. Communications in Statistics – Theory and Methods, 9, 1043–1069. https://doi.org/10.1080/03610928008827941
(2011). RRLOGIT: Stata module to estimate logistic regression for randomized response data, Retrieved from https://ideas.repec.org/c/boc/bocode/s456203.html
(2012). Asking sensitive questions using the crosswise model: Questions using the crosswise model an experimental survey measuring plagiarism. Public Opinion Quarterly, 76, 32–49. https://doi.org/10.1093/poq/nfr036
(2012). Applicants’ self-presentational behavior: What do recruiters expect and what do they get? Journal of Experimental Psychology, 11, 77–85. https://doi.org/10.1027/1866-5888/a000046
(1990). Asking sensitive questions indirectly. Biometrika, 77, 436–438. https://doi.org/10.1093/biomet/77.2.436
(2005). Meta-analysis of randomized response research: 35 years of validation. Sociological Methods & Research., 33, 319–348. https://doi.org/10.1177/0049124104268664
(1989). Generalized linear model (2nd ed.). London, UK: Chapman & Hall.
(2008). Generalized linear, and mixed models (2nd ed.). New York, NY: Wiley.
(2014). An experimental validation method for questioning techniques that assess sensitive issues. Experimental Psychology, 61, 48–54. https://doi.org/10.1027/1618-3169/a000226
(2014). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org/
(2007). Multilevel and latent variable modeling with composite links and exploded likelihoods. Psychometrika, 72, 123–140. https://doi.org/10.1007/s11336-006-1453-8
(2015). An empirical validation study of popular survey methodologies for sensitive questions. American Journal of Political Science, 60, 783–802. https://doi.org/10.1111/ajps.12205
(1988). Covariate randomized response model. Journal of the American Statistical Association, 83, 969–974. https://doi.org/10.1080/01621459.1988.10478686
(1981). Composite link functions in generalized linear models. Journal of the Royal Statistical Society, Series C, 30, 125–131. https://doi.org/10.2307/2346381
(2007). Sensitive questions in surveys. Psychological Bulletin, 133, 859–883. https://doi.org/10.1037/0033-2909.133.5.859
(2012). Regression for categorical data. Cambridge, UK: Cambridge University Press.
(2010). Estimating the prevalence of sensitive behaviour and cheating with a dual design for direct questioning and randomized response. Journal of the Royal Statistical Society, Series C, 59, 723–736. https://doi.org/10.1111/j.1467-9876.2010.00720.x
(2007). The logistic regression model with response variables subject to randomized response. Computational Statistics & Data Analysis, 51, 6060–6069. https://doi.org/10.1016/j.csda.2006.12.002
(2010). The randomized response log linear model as a composite link model. Statistical Modelling, 10, 57–67. https://doi.org/10.1177/1471082X0801000104
(2000). A comparison of randomized response, computer-assisted self-interview, and face-to-face direct questioning eliciting sensitive information in the context of welfare and unemployment benefit. Sociological Methods & Research, 28, 505–537. https://doi.org/10.1177/0049124100028004005
(1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60, 63–69. https://doi.org/10.1080/01621459.1965.10480775
(2008). Two new models for survey sampling with sensitive characteristic; design and analysis. Metrika, 67, 251–263. https://doi.org/10.1007/s00184-007-0131-x
(