Abstract
Unified approach (Chen and Chen in J R Stat Soc B 62(3):449–460, 2000) uses a working regression model to extract information from auxiliary variables in two-stage study for computing an efficient estimator of regression parameter. As far as we know, the method is limited to deal with missing complete at random data in a simple monotone missing data pattern. In this research, we extend the unified approach to estimate regression models with nonmonotone missing at random data. We describe an inverse probability weighting estimator condition on estimators from a set of working regression models which contains information from incomplete data and auxiliary variables. The proposed method is flexible and can easily accommodate incomplete data and auxiliary variables. We investigate the finite-sample performance of the proposed estimators using simulation studies and further illustrate the estimation method on a case–control study investigating the risk factors of hip fractures.
Similar content being viewed by others
References
Barengolts, E., Karanouh, D., Kolodny, L., Kukreja, S.: Risk factors for hip fractures in predominantly african-american veteran male population. J. Bone Miner. Res. 16, S170 (2001)
Breslow, N.E., Lumley, T., Ballantyne, C.M., Chambless, L.E., Kulich, M.: Improved Horvitz-Thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology. Stat. Biosciences 1, 32–49 (2009)
Breunig, C., Haan, P.: Nonparametric regression with selectively missing covariates. arXiv:1810.00411v2 [econ.EM], 1–37 (2019)
Breunig, C., Mammen, E., Simoni, A.: Nonparametric estimation in case of endogenous selection. J. Econom. 202, 268–285 (2018)
Chatterjee, N., Chen, Y., Breslow, N.E.: A pseudo-score estimator for regression problems with two-phase sampling. J. Am. Stat. Assoc. 98, 158–168 (2003)
Chatterjee, N., Li, Y.: Inference in semiparametric regression models under partial questionnaire design and nonmonotone missing data. J. Am. Stat. Assoc. 105, 787–797 (2010)
Chen, H.Y.: Nonparametric and semiparametric models for missing covariates in parametric regression. J. Am. Stat. Assoc. 99, 1176–1189 (2004)
Chen, H.Y., Xie, H., Qian, Y.: Multiple imputation for missing values through conditional semiparametric odds ratio models. Biometrics 67, 799–809 (2011)
Chen, Y.H., Chen, H.: A unified approach to regression analysis under double-sampling designs. J. R. Stat. Soc. B 62(3), 449–460 (2000)
Fitzmaurice, G., Davidian, M., Verbeke, G., Molenberghs, G.: Longitudinal data analysis. Chapman and Hall/CRC, Boca Raton (2009)
Han, P.: Multiply robust estimation in regression analysis with missing data. J. Am. Stat. Assoc. 109, 1159–1173 (2014)
Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663–685 (1952)
Ibrahim, J.G., Chen, M.H., Lipsitz, S.R., Herring, A.H.: Missing-data methods for generalized linear models: A comparative review. J. Am. Stat. Assoc. 100, 332–346 (2005)
van der Laan, M.J., Robins, J.M.: Unified Methods for Censored Longitudinal Data and Causality. Springer-Verlag, New York (2003)
Lawless, J.F., Kalbfleisch, J.D., Wild, C.J.: Semiparametric methods for response-selective and missing data problems in regression. J. Royal Stat. Soc. B 61(2), 413–438 (1999)
Lipsitz, S.R., Ibrahim, J.G.: A conditional model for incomplete covariates in parametric regression models. Biometrika 83(4), 916–922 (1996)
Lipsitz, S.R., Ibrahim, J.G., Zhao, L.: A weighted estimating equation for missing covariate data with properties similar to maximum likelihood. J. Am. Stat. Assoc. 94, 1147–1160 (1999)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2002)
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89, 846–866 (1994)
Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
Rubin, D.B.: Multiple Imputationfor Nonresponse in Surveys. Wiley, New York (1987)
Rubin, D.B.: Multiple imputation after 18+ years. J. Am. Stat. Assoc. 91, 473–489 (1996)
Scheuren, F.: Multiple imputation: how it began and continues. J. Am. Stat. Assoc. 59, 315–319 (2005)
Sun, B., Tchetgen, E.J.T.: On inverse probability weighting for nonmonotone missing at random data. J. Am. Stat. Assoc. 113, 369–379 (2018)
Tsiatis, A.: Semiparametric Theory and Missing Data. Springer, New York (2006)
Wacholder, S., Carroll, R.J., Pee, D., Gail, M.G.: The partial questionnaire design for case-control studies. Stat. Med. 13, 623–634 (1994)
Zhao, L.P., Lipsitz, S.: Designs and analysis of two-stage studies. Stat. Med. 11, 769–782 (1992)
Zhao, Y.: Statistical inference for missing data mechanisms. Stat. Med. (2020). https://doi.org/10.1002/sim.8727
Zhao, Y., Lawless, J.F., McLeish, D.L.: Likelihood methods for regression models with expensive variables missing by design. Biometrical J. 51, 123–136 (2009)
Acknowledgements
We thank Professor Donald L. McLeish, Professor Jerald F. Lawless, the associate editor, and the anonymous reviewers for their helpful comments and suggestions. We are grateful to Professor Hua Yun Chen for letting us use the hip fracture data.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was partially supported by grant from the Natural Sciences and Engineering Research Council of Canada (YZ).
Rights and permissions
About this article
Cite this article
Zhao, Y., Liu, M. Unified approach for regression models with nonmonotone missing at random data. AStA Adv Stat Anal 105, 87–101 (2021). https://doi.org/10.1007/s10182-020-00389-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-020-00389-y