Abstract
This paper considers multiple imputation (MI) approaches for handling non-monotone missing longitudinal binary responses when estimating parameters of a marginal model using generalized estimating equations (GEE). GEE has been shown to yield consistent estimates of the regression parameters for a marginal model when data are missing completely at random (MCAR). However, when data are missing at random (MAR), the GEE estimates may not be consistent; the MI approaches proposed in this paper minimize bias under MAR. The first MI approach proposed is based on a multivariate normal distribution, but with the addition of pairwise products among the binary outcomes to the multivariate normal vector. Even though the multivariate normal does not impute 0 or 1 values for the missing binary responses, as discussed by Horton et al. (Am Stat 57:229–232, 2003), we suggest not rounding when filling in the missing binary data because it could increase bias. The second MI approach considered is the fully conditional specification (FCS) approach. In this approach, we specify a logistic regression model for each outcome given the outcomes at other time points and the covariates. Typically, one would only include main effects of the outcome at the other times as predictors in the FCS approach, but we explore if bias can be reduced by also including pairwise interactions of the outcomes at other time point in the FCS. In a study of asymptotic bias with non-monotone missing data, the proposed MI approaches are also compared to GEE without imputation. Finally, the proposed methods are illustrated using data from a longitudinal clinical trial comparing four psychosocial treatments from the National Institute on Drug Abuse Collaborative Cocaine Treatment Study, where patients’ cocaine use is collected monthly for 6 months during treatment.
Similar content being viewed by others
References
Bahadur, R. R. (1961). A representation of the joint distribution of responses to n dichotomous items. In H. Solomon (Ed.), Studies in item analysis and prediction (pp. 158–68)., Stanford mathematical studies in the social sciences VI Stanford: Stanford University Press.
Barnard, J., & Meng, X. L. (1999). Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8, 17–36.
Beunckens, C., Sotto, C., & Molenberghs, G. (2008). A simulation study comparing weighted estimating equations with multiple imputation based estimating equations for longitudinal binary data. Computational Statistics & Data Analysis, 52, 1533–1548.
Carey, V. J., Lumley, T., & Ripley, B. D. (2012). gee: Generalized estimation equation solver. http://CRAN.R-project.org/package=gee. R package version 4.13-18
Carey, V., Zeger, S. L., & Diggle, P. J. (1993). Modelling multivariate binary data with alternating logistic regressions. Biometrika, 80, 517–526.
Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. New York: Wiley.
Crits-Christoph, P. (1999). Psychosocial treatments for cocaine dependence: National institute on drug abuse collaborative cocaine treatment study. Archives of General Psychiatry, 56, 493–502.
Enders, C. K. (2010). Applied missing data analysis. New York: The Guilford Press.
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. E. (1996). Markov Chain Monte Carlo in practice. New York: Chapman & Hall.
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213.
Horton, N. J., & Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software packages for regression models with missing variables. American Statistician, 55, 244–254.
Horton, N. J., Parzen, M., & Lipsitz, S. R. (2003). A potential for bias when rounding in multiple imputation. The American Statistician, 57, 229–232.
Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis. Upper Saddle River, NJ: Prentice Hall.
Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7, 305–315.
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
Lipsitz, S. R., Fitzmaurice, G. M., Orav, E. J., & Laird, N. M. (1994). Performance of generalized estimating equations in practical situations. Biometrics, 50, 270–278.
Lipsitz, S. R., Laird, N. M., & Harrington, D. P. (1992). A three-stage estimator for studies with repeated and possibly missing binary outcomes. Applied Statistics, 41, 203–213.
Lipsitz, S. R., Molenberghs, G., Fitzmaurice, G. M., & Ibrahim, J. (2000). GEE with Gaussian estimation of the correlations when data are incomplete. Biometrics, 56, 528–536.
Little, R. J. A., & Rubin, D. B. (2002). MStatistical analysis with missing data (2nd ed.). New York: Wiley.
Liu, M., Taylor, J. M., & Belin, T. R. (2000). Multiple imputation and posterior simulation for multivariate missing data in longitudinal studies. Biometrics, 56, 1157–1163.
Paik, M. (1997). The generalized estimating equation approach when data are not missing completely at random. Journal of the American Statistical Association, 92, 1320–1329.
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90, 106–121.
Rotnitzky, A., & Wypij, D. (1994). A note on the bias of estimators with missing data. Biometrics, 50, 1163–1170.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Rubin, D. B. (1978). Multiple imputations in sample surveys—A phenominological bayesian approach to nonresponse. In Proceedings of the International Statistical Institute, Manila (pp. 517–532).
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Rubin, D. B., & Schenker, N. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. JASA, 81, 366–374.
SAS Institute Inc. (2020). SAS/STAT Software, Version 9.4. Cary, NC. http://www.sas.com/.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall Ltd.
Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3–15.
Scheuren, F. (2005). Multiple imputation: How it began and continues. The American Statistician, 59, 315–319.
Tchetgen, E., Wang, L. & Sun, B. (2017). Discrete choice models for nonmonotone nonignorable missing data: Identification and inference. Unpublished Manuscript. Archived as arXiv:1607.02631v3 [stat.ME].
Van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16, 219–242.
Van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: Chapman & Hall/CRC.
Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–68.
Zellner, A., & Rossi, P. E. (1984). Bayesian analysis of dichotomous quantal response models. Journal of Econometrics, 25, 365–393.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors were supported by grants from the National Institutes of Health, National Institute on Drug Abuse Grants [NIDA R33 DA042847, UG1 DA015831, K24 DA022288].
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Lipsitz, S.R., Fitzmaurice, G.M. & Weiss, R.D. Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes. Psychometrika 85, 890–904 (2020). https://doi.org/10.1007/s11336-020-09729-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-020-09729-y