Abstract
In this paper, we propose improved statistical inference and variable selection methods for generalized linear models based on empirical likelihood approach that accommodates both the within-subject correlations and nonignorable dropouts. We first apply the generalized method of moments to estimate the parameters in the nonignorable dropout propensity based on an instrument. The inverse probability weighting is applied to obtain the bias-corrected generalized estimating equations (GEEs), and then we borrow the idea of quadratic inference function and hybrid GEE to construct the empirical likelihood procedures for longitudinal data with nonignorable dropouts, respectively. Two different classes of estimators and their confidence regions are derived. Further, the penalized EL method and algorithm for variable selection are investigated. The finite-sample performance of the proposed estimators is studied through simulation, and an application to HIV-CD4 data set is also presented.
Similar content being viewed by others
References
Ai, C., Linton, O., Zhang, Z. (2018). A simple and efficient estimation method for models with nonignorable missing data. Statistica Sinica, to appear.
Bai, Y., Fung, W. K., Zhu, Z. (2010). Weighted empirical likelihood for generalized linear models with longitudinal data. Journal of Statistical Planning and Inference, 140, 3446–3456.
Cantoni, E., Flemming, J. M., Ronchetti, E. (2005). Variable selection for marginal longitudinal generalized linear models. Biometrics, 61, 507–514.
Chen, J., Chen, Z. (2008). Extended Bayesian information criterion for model selection with large sample space. Biometrika, 95, 759–771.
Cho, H., Qu, A. (2015). Efficient estimation for longitudinal data by combining large-dimensional moment conditions. Electronic Journal of Statistics, 9, 1315–1334.
Diggle, P., Kenward, M. G. (1994). Informative drop-out in longitudinal data analysis (with discussion). Journal of the Royal Statistical Society Series C (Applied Statistics), 43, 49–93.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96, 1348–1360.
Fu, L., Wang, Y. (2012). Quantile regression for longitudinal data with a working correlation model. Computational Statistics and Data Analysis, 56, 2526–2538.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054.
Huang, J., Liu, L., Liu, N. (2007). Estimation of large covariance matrices of longitudinal data with basis function approximations. Journal of Computational and Graphical Statistics, 16, 189–209.
Ibrahim, J. G., Lipsitz, S. R., Horton, N. (2001). Using auxiliary data for parameter estimation with non-ignorably missing outcomes. Journal of the Royal Statistical Society: Series C (Applied Statistics), 50, 361–373.
Kim, J. K., Yu, C. L. (2011). A semiparametric estimation of mean functionals with nonignorable missing data. Journal of the American Statistical Association, 106, 157–165.
Leng, C., Zhang, W. (2014). Smoothing combined estimating equations in quantile regression for longitudinal data. Statistics and Computing, 24, 123–136.
Leng, C., Zhang, W., Pan, J. (2010). Semiparametric mean-covariance regression analysis for longitudinal data. Journal of the American Statistical Association, 105, 181–193.
Leung, D., Wang, Y., Zhu, M. (2009). Efficient parameter estimation in longitudinal data analysis using a hybrid GEE method. Biostatistics, 10, 436–445.
Li, D., Pan, J. (2013). Empirical likelihood for generalized linear models with longitudinal data. Journal of Multivariate Analysis, 114, 63–73.
Liang, K., Zeger, S. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
Little, R. J. A., Rubin, D. B. (2002). Statistical Analysis with Missing Data2nd ed. New York: Wiley.
Lv, J., Guo, C., Yang, H., Li, Y. (2017). A moving average Cholesky factor model in covariance modeling for composite quantile regression with longitudinal data. Computational Statistics and Data Analysis, 112, 129–144.
Miao, W., Tchetgen Tchetgen, E. J. (2016). On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika, 103, 475–482.
Molenberghs, G., Kenward, M. (2007). Missing Data in Clinical Studies. West Sussex: John Wiley and Sons.
Owen, A. (2001). Empirical Likelihood. Boca Raton, FL: Chapman and Hall/CRC Press.
Qin, J., Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22, 300–325.
Qu, A., Lindsay, B. G., Li, B. (2000). Improving generalised estimating equations using quadratic inference functions. Biometrika, 87, 823–836.
Rao, J. N. K., Scott, A. J. (1981). The analysis of categorical data from complex sample surveys: Chi-squared tests for goodness of fit and independence in two-way tables. Journal of the American statistical Association, 76, 221–230.
Robins, J. M., Rotnitzky, A., Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89, 846–866.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of statistics, 6, 461–464.
Shao, J., Wang, L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika, 103, 175–187.
Tang, G., Little, R. J. A., Raghunathan, T. E. (2003). Analysis of multivariate missing data with nonignorable nonresponse. Biometrika, 90, 747–764.
Wang, H., Li, B., Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 671–683.
Wang, L., Qi, C., Shao, J. (2019). Model-assisted regression estimators for longitudinal data with nonignorable dropout. International Statistical Review, 87, S121–S138.
Wang, S., Shao, J., Kim, J. K. (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica, 24, 1097–1116.
Xu, L., Tang, M. L., Chen, Z. (2019). Analysis of longitudinal data by combining multiple dynamic covariance models. Statistics and Its Interface, 12, 479–487.
Xue, L., Zhu, L. (2007). Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika, 94, 921–937.
You, J., Chen, G., Zhou, Y. (2006). Block empirical likelihood for longitudinal partially linear regression models. Canadian Journal of Statistics, 34, 79–96.
Zahner, G. E., Pawelkiewicz, W., DeFrancesco, J. J., Adnopoz, J. (1992). Children’s mental health service needs and utilization patterns in an urban community: An epidemiological assessment. Journal of the American Academy of Child and Adolescent Psychiatry, 31, 951–960.
Zhang, W., Leng, C. (2011). A moving average Cholesky factor model in covariance modelling for longitudinal data. Biometrika, 99, 141–150.
Zhang, W., Leng, C., Tang, C. Y. (2015). A joint modelling approach for longitudinal studies. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77, 219–238.
Zhao, P., Tang, N., Jiang, D. (2017). Efficient inverse probability weighting method for quantile regression with nonignorable missing data. Statistics, 51, 363–386.
Zhou, J., Qu, A. (2012). Informative estimation and selection of correlation structure for longitudinal data. Journal of the American statistical Association, 107, 701–710.
Zou, H., Yuan, M. (2008). Composite quantile regression and the oracle model selection theory. The Annals of Statistics, 36, 1108–1126.
Acknowledgements
We are grateful to the Editor, the associate editor and two anonymous referees for their insightful comments and suggestions, which have led to significant improvements. Our research was supported by the National Natural Science Foundation of China (11871287, 11501208, 11771144, 11801359), the Natural Science Foundation of Tianjin (18JCYBJC41100), the Fundamental Research Funds for the Central Universities, the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin. The two authors contributed equally to this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material
The Supplementary Material contains proofs of the theorems and other technical material. (pdf 220KB)
About this article
Cite this article
Wang, L., Ma, W. Improved empirical likelihood inference and variable selection for generalized linear models with longitudinal nonignorable dropouts. Ann Inst Stat Math 73, 623–647 (2021). https://doi.org/10.1007/s10463-020-00761-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-020-00761-4