Skip to main content
Log in

Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

This paper considers multiple imputation (MI) approaches for handling non-monotone missing longitudinal binary responses when estimating parameters of a marginal model using generalized estimating equations (GEE). GEE has been shown to yield consistent estimates of the regression parameters for a marginal model when data are missing completely at random (MCAR). However, when data are missing at random (MAR), the GEE estimates may not be consistent; the MI approaches proposed in this paper minimize bias under MAR. The first MI approach proposed is based on a multivariate normal distribution, but with the addition of pairwise products among the binary outcomes to the multivariate normal vector. Even though the multivariate normal does not impute 0 or 1 values for the missing binary responses, as discussed by Horton et al. (Am Stat 57:229–232, 2003), we suggest not rounding when filling in the missing binary data because it could increase bias. The second MI approach considered is the fully conditional specification (FCS) approach. In this approach, we specify a logistic regression model for each outcome given the outcomes at other time points and the covariates. Typically, one would only include main effects of the outcome at the other times as predictors in the FCS approach, but we explore if bias can be reduced by also including pairwise interactions of the outcomes at other time point in the FCS. In a study of asymptotic bias with non-monotone missing data, the proposed MI approaches are also compared to GEE without imputation. Finally, the proposed methods are illustrated using data from a longitudinal clinical trial comparing four psychosocial treatments from the National Institute on Drug Abuse Collaborative Cocaine Treatment Study, where patients’ cocaine use is collected monthly for 6 months during treatment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bahadur, R. R. (1961). A representation of the joint distribution of responses to n dichotomous items. In H. Solomon (Ed.), Studies in item analysis and prediction (pp. 158–68)., Stanford mathematical studies in the social sciences VI Stanford: Stanford University Press.

    Google Scholar 

  • Barnard, J., & Meng, X. L. (1999). Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8, 17–36.

    Article  Google Scholar 

  • Beunckens, C., Sotto, C., & Molenberghs, G. (2008). A simulation study comparing weighted estimating equations with multiple imputation based estimating equations for longitudinal binary data. Computational Statistics & Data Analysis, 52, 1533–1548.

    Article  Google Scholar 

  • Carey, V. J., Lumley, T., & Ripley, B. D. (2012). gee: Generalized estimation equation solver. http://CRAN.R-project.org/package=gee. R package version 4.13-18

  • Carey, V., Zeger, S. L., & Diggle, P. J. (1993). Modelling multivariate binary data with alternating logistic regressions. Biometrika, 80, 517–526.

    Article  Google Scholar 

  • Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. New York: Wiley.

    Book  Google Scholar 

  • Crits-Christoph, P. (1999). Psychosocial treatments for cocaine dependence: National institute on drug abuse collaborative cocaine treatment study. Archives of General Psychiatry, 56, 493–502.

    Article  Google Scholar 

  • Enders, C. K. (2010). Applied missing data analysis. New York: The Guilford Press.

    Google Scholar 

  • Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. E. (1996). Markov Chain Monte Carlo in practice. New York: Chapman & Hall.

    Google Scholar 

  • Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213.

    Article  Google Scholar 

  • Horton, N. J., & Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software packages for regression models with missing variables. American Statistician, 55, 244–254.

    Article  Google Scholar 

  • Horton, N. J., Parzen, M., & Lipsitz, S. R. (2003). A potential for bias when rounding in multiple imputation. The American Statistician, 57, 229–232.

    Article  Google Scholar 

  • Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis. Upper Saddle River, NJ: Prentice Hall.

    Google Scholar 

  • Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7, 305–315.

    Article  Google Scholar 

  • Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.

    Article  Google Scholar 

  • Lipsitz, S. R., Fitzmaurice, G. M., Orav, E. J., & Laird, N. M. (1994). Performance of generalized estimating equations in practical situations. Biometrics, 50, 270–278.

    Article  Google Scholar 

  • Lipsitz, S. R., Laird, N. M., & Harrington, D. P. (1992). A three-stage estimator for studies with repeated and possibly missing binary outcomes. Applied Statistics, 41, 203–213.

    Article  Google Scholar 

  • Lipsitz, S. R., Molenberghs, G., Fitzmaurice, G. M., & Ibrahim, J. (2000). GEE with Gaussian estimation of the correlations when data are incomplete. Biometrics, 56, 528–536.

    Article  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (2002). MStatistical analysis with missing data (2nd ed.). New York: Wiley.

    Book  Google Scholar 

  • Liu, M., Taylor, J. M., & Belin, T. R. (2000). Multiple imputation and posterior simulation for multivariate missing data in longitudinal studies. Biometrics, 56, 1157–1163.

    Article  Google Scholar 

  • Paik, M. (1997). The generalized estimating equation approach when data are not missing completely at random. Journal of the American Statistical Association, 92, 1320–1329.

    Article  Google Scholar 

  • Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90, 106–121.

    Article  Google Scholar 

  • Rotnitzky, A., & Wypij, D. (1994). A note on the bias of estimators with missing data. Biometrics, 50, 1163–1170.

    Article  Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.

    Article  Google Scholar 

  • Rubin, D. B. (1978). Multiple imputations in sample surveys—A phenominological bayesian approach to nonresponse. In Proceedings of the International Statistical Institute, Manila (pp. 517–532).

  • Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

    Book  Google Scholar 

  • Rubin, D. B., & Schenker, N. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. JASA, 81, 366–374.

    Article  Google Scholar 

  • SAS Institute Inc. (2020). SAS/STAT Software, Version 9.4. Cary, NC. http://www.sas.com/.

  • Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall Ltd.

    Book  Google Scholar 

  • Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3–15.

    Article  Google Scholar 

  • Scheuren, F. (2005). Multiple imputation: How it began and continues. The American Statistician, 59, 315–319.

    Article  Google Scholar 

  • Tchetgen, E., Wang, L. & Sun, B. (2017). Discrete choice models for nonmonotone nonignorable missing data: Identification and inference. Unpublished Manuscript. Archived as arXiv:1607.02631v3 [stat.ME].

  • Van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16, 219–242.

    Article  Google Scholar 

  • Van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–68.

    Article  Google Scholar 

  • Zellner, A., & Rossi, P. E. (1984). Bayesian analysis of dichotomous quantal response models. Journal of Econometrics, 25, 365–393.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stuart R. Lipsitz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors were supported by grants from the National Institutes of Health, National Institute on Drug Abuse Grants [NIDA R33 DA042847, UG1 DA015831, K24 DA022288].

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 39 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lipsitz, S.R., Fitzmaurice, G.M. & Weiss, R.D. Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes. Psychometrika 85, 890–904 (2020). https://doi.org/10.1007/s11336-020-09729-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-020-09729-y

Keywords

Navigation