Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes

Lipsitz, Stuart R.; Fitzmaurice, Garrett M.; Weiss, Roger D.

doi:10.1007/s11336-020-09729-y

Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes

Theory and Methods
Published: 02 October 2020

Volume 85, pages 890–904, (2020)
Cite this article

Psychometrika Aims and scope Submit manuscript

Stuart R. Lipsitz ORCID: orcid.org/0000-0003-2619-1389¹,
Garrett M. Fitzmaurice² &
Roger D. Weiss²

1152 Accesses
1 Citation
Explore all metrics

Abstract

This paper considers multiple imputation (MI) approaches for handling non-monotone missing longitudinal binary responses when estimating parameters of a marginal model using generalized estimating equations (GEE). GEE has been shown to yield consistent estimates of the regression parameters for a marginal model when data are missing completely at random (MCAR). However, when data are missing at random (MAR), the GEE estimates may not be consistent; the MI approaches proposed in this paper minimize bias under MAR. The first MI approach proposed is based on a multivariate normal distribution, but with the addition of pairwise products among the binary outcomes to the multivariate normal vector. Even though the multivariate normal does not impute 0 or 1 values for the missing binary responses, as discussed by Horton et al. (Am Stat 57:229–232, 2003), we suggest not rounding when filling in the missing binary data because it could increase bias. The second MI approach considered is the fully conditional specification (FCS) approach. In this approach, we specify a logistic regression model for each outcome given the outcomes at other time points and the covariates. Typically, one would only include main effects of the outcome at the other times as predictors in the FCS approach, but we explore if bias can be reduced by also including pairwise interactions of the outcomes at other time point in the FCS. In a study of asymptotic bias with non-monotone missing data, the proposed MI approaches are also compared to GEE without imputation. Finally, the proposed methods are illustrated using data from a longitudinal clinical trial comparing four psychosocial treatments from the National Institute on Drug Abuse Collaborative Cocaine Treatment Study, where patients’ cocaine use is collected monthly for 6 months during treatment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study

Article Open access 23 July 2019

Comparison of methods for imputing limited-range variables: a simulation study

Article Open access 26 April 2014

Explicating the Conditions Under Which Multilevel Multiple Imputation Mitigates Bias Resulting from Random Coefficient-Dependent Missing Longitudinal Data

Article 19 November 2016

References

Bahadur, R. R. (1961). A representation of the joint distribution of responses to n dichotomous items. In H. Solomon (Ed.), Studies in item analysis and prediction (pp. 158–68)., Stanford mathematical studies in the social sciences VI Stanford: Stanford University Press.
Google Scholar
Barnard, J., & Meng, X. L. (1999). Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8, 17–36.
Article Google Scholar
Beunckens, C., Sotto, C., & Molenberghs, G. (2008). A simulation study comparing weighted estimating equations with multiple imputation based estimating equations for longitudinal binary data. Computational Statistics & Data Analysis, 52, 1533–1548.
Article Google Scholar
Carey, V. J., Lumley, T., & Ripley, B. D. (2012). gee: Generalized estimation equation solver. http://CRAN.R-project.org/package=gee. R package version 4.13-18
Carey, V., Zeger, S. L., & Diggle, P. J. (1993). Modelling multivariate binary data with alternating logistic regressions. Biometrika, 80, 517–526.
Article Google Scholar
Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. New York: Wiley.
Book Google Scholar
Crits-Christoph, P. (1999). Psychosocial treatments for cocaine dependence: National institute on drug abuse collaborative cocaine treatment study. Archives of General Psychiatry, 56, 493–502.
Article Google Scholar
Enders, C. K. (2010). Applied missing data analysis. New York: The Guilford Press.
Google Scholar
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. E. (1996). Markov Chain Monte Carlo in practice. New York: Chapman & Hall.
Google Scholar
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213.
Article Google Scholar
Horton, N. J., & Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software packages for regression models with missing variables. American Statistician, 55, 244–254.
Article Google Scholar
Horton, N. J., Parzen, M., & Lipsitz, S. R. (2003). A potential for bias when rounding in multiple imputation. The American Statistician, 57, 229–232.
Article Google Scholar
Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis. Upper Saddle River, NJ: Prentice Hall.
Google Scholar
Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7, 305–315.
Article Google Scholar
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
Article Google Scholar
Lipsitz, S. R., Fitzmaurice, G. M., Orav, E. J., & Laird, N. M. (1994). Performance of generalized estimating equations in practical situations. Biometrics, 50, 270–278.
Article Google Scholar
Lipsitz, S. R., Laird, N. M., & Harrington, D. P. (1992). A three-stage estimator for studies with repeated and possibly missing binary outcomes. Applied Statistics, 41, 203–213.
Article Google Scholar
Lipsitz, S. R., Molenberghs, G., Fitzmaurice, G. M., & Ibrahim, J. (2000). GEE with Gaussian estimation of the correlations when data are incomplete. Biometrics, 56, 528–536.
Article Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). MStatistical analysis with missing data (2nd ed.). New York: Wiley.
Book Google Scholar
Liu, M., Taylor, J. M., & Belin, T. R. (2000). Multiple imputation and posterior simulation for multivariate missing data in longitudinal studies. Biometrics, 56, 1157–1163.
Article Google Scholar
Paik, M. (1997). The generalized estimating equation approach when data are not missing completely at random. Journal of the American Statistical Association, 92, 1320–1329.
Article Google Scholar
Robins, J. M., Rotnitzky, A., & Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90, 106–121.
Article Google Scholar
Rotnitzky, A., & Wypij, D. (1994). A note on the bias of estimators with missing data. Biometrics, 50, 1163–1170.
Article Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Article Google Scholar
Rubin, D. B. (1978). Multiple imputations in sample surveys—A phenominological bayesian approach to nonresponse. In Proceedings of the International Statistical Institute, Manila (pp. 517–532).
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Book Google Scholar
Rubin, D. B., & Schenker, N. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. JASA, 81, 366–374.
Article Google Scholar
SAS Institute Inc. (2020). SAS/STAT Software, Version 9.4. Cary, NC. http://www.sas.com/.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall Ltd.
Book Google Scholar
Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3–15.
Article Google Scholar
Scheuren, F. (2005). Multiple imputation: How it began and continues. The American Statistician, 59, 315–319.
Article Google Scholar
Tchetgen, E., Wang, L. & Sun, B. (2017). Discrete choice models for nonmonotone nonignorable missing data: Identification and inference. Unpublished Manuscript. Archived as arXiv:1607.02631v3 [stat.ME].
Van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16, 219–242.
Article Google Scholar
Van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton, FL: Chapman & Hall/CRC.
Book Google Scholar
Van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–68.
Article Google Scholar
Zellner, A., & Rossi, P. E. (1984). Bayesian analysis of dichotomous quantal response models. Journal of Econometrics, 25, 365–393.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Division of General Internal Medicine, Brigham and Women’s Hospital and Ariadne Labs, 1620 Tremont St. 3rd Floor, BC3 002D, Boston, MA, 02120-1613, USA
Stuart R. Lipsitz
McLean Hospital, Belmont, MA, USA
Garrett M. Fitzmaurice & Roger D. Weiss

Authors

Stuart R. Lipsitz
View author publications
You can also search for this author in PubMed Google Scholar
Garrett M. Fitzmaurice
View author publications
You can also search for this author in PubMed Google Scholar
Roger D. Weiss
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stuart R. Lipsitz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors were supported by grants from the National Institutes of Health, National Institute on Drug Abuse Grants [NIDA R33 DA042847, UG1 DA015831, K24 DA022288].

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 39 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lipsitz, S.R., Fitzmaurice, G.M. & Weiss, R.D. Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes. Psychometrika 85, 890–904 (2020). https://doi.org/10.1007/s11336-020-09729-y

Download citation

Received: 07 November 2019
Accepted: 14 September 2020
Published: 02 October 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11336-020-09729-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes

Abstract

Access this article

Similar content being viewed by others

Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study

Comparison of methods for imputing limited-range variables: a simulation study

Explicating the Conditions Under Which Multilevel Multiple Imputation Mitigates Bias Resulting from Random Coefficient-Dependent Missing Longitudinal Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (zip 39 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using Multiple Imputation with GEE with Non-monotone Missing Longitudinal Binary Outcomes

Abstract

Access this article

Similar content being viewed by others

Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study

Comparison of methods for imputing limited-range variables: a simulation study

Explicating the Conditions Under Which Multilevel Multiple Imputation Mitigates Bias Resulting from Random Coefficient-Dependent Missing Longitudinal Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (zip 39 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation