Skip to main content
Log in

Improved empirical likelihood inference and variable selection for generalized linear models with longitudinal nonignorable dropouts

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In this paper, we propose improved statistical inference and variable selection methods for generalized linear models based on empirical likelihood approach that accommodates both the within-subject correlations and nonignorable dropouts. We first apply the generalized method of moments to estimate the parameters in the nonignorable dropout propensity based on an instrument. The inverse probability weighting is applied to obtain the bias-corrected generalized estimating equations (GEEs), and then we borrow the idea of quadratic inference function and hybrid GEE to construct the empirical likelihood procedures for longitudinal data with nonignorable dropouts, respectively. Two different classes of estimators and their confidence regions are derived. Further, the penalized EL method and algorithm for variable selection are investigated. The finite-sample performance of the proposed estimators is studied through simulation, and an application to HIV-CD4 data set is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ai, C., Linton, O., Zhang, Z. (2018). A simple and efficient estimation method for models with nonignorable missing data. Statistica Sinica, to appear.

  • Bai, Y., Fung, W. K., Zhu, Z. (2010). Weighted empirical likelihood for generalized linear models with longitudinal data. Journal of Statistical Planning and Inference, 140, 3446–3456.

    Article  MathSciNet  Google Scholar 

  • Cantoni, E., Flemming, J. M., Ronchetti, E. (2005). Variable selection for marginal longitudinal generalized linear models. Biometrics, 61, 507–514.

    Article  MathSciNet  Google Scholar 

  • Chen, J., Chen, Z. (2008). Extended Bayesian information criterion for model selection with large sample space. Biometrika, 95, 759–771.

    Article  MathSciNet  Google Scholar 

  • Cho, H., Qu, A. (2015). Efficient estimation for longitudinal data by combining large-dimensional moment conditions. Electronic Journal of Statistics, 9, 1315–1334.

    Article  MathSciNet  Google Scholar 

  • Diggle, P., Kenward, M. G. (1994). Informative drop-out in longitudinal data analysis (with discussion). Journal of the Royal Statistical Society Series C (Applied Statistics), 43, 49–93.

    MATH  Google Scholar 

  • Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96, 1348–1360.

    Article  MathSciNet  Google Scholar 

  • Fu, L., Wang, Y. (2012). Quantile regression for longitudinal data with a working correlation model. Computational Statistics and Data Analysis, 56, 2526–2538.

    Article  MathSciNet  Google Scholar 

  • Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054.

    Article  MathSciNet  Google Scholar 

  • Huang, J., Liu, L., Liu, N. (2007). Estimation of large covariance matrices of longitudinal data with basis function approximations. Journal of Computational and Graphical Statistics, 16, 189–209.

    Article  MathSciNet  Google Scholar 

  • Ibrahim, J. G., Lipsitz, S. R., Horton, N. (2001). Using auxiliary data for parameter estimation with non-ignorably missing outcomes. Journal of the Royal Statistical Society: Series C (Applied Statistics), 50, 361–373.

    MathSciNet  MATH  Google Scholar 

  • Kim, J. K., Yu, C. L. (2011). A semiparametric estimation of mean functionals with nonignorable missing data. Journal of the American Statistical Association, 106, 157–165.

    Article  MathSciNet  Google Scholar 

  • Leng, C., Zhang, W. (2014). Smoothing combined estimating equations in quantile regression for longitudinal data. Statistics and Computing, 24, 123–136.

    Article  MathSciNet  Google Scholar 

  • Leng, C., Zhang, W., Pan, J. (2010). Semiparametric mean-covariance regression analysis for longitudinal data. Journal of the American Statistical Association, 105, 181–193.

    Article  MathSciNet  Google Scholar 

  • Leung, D., Wang, Y., Zhu, M. (2009). Efficient parameter estimation in longitudinal data analysis using a hybrid GEE method. Biostatistics, 10, 436–445.

    Article  Google Scholar 

  • Li, D., Pan, J. (2013). Empirical likelihood for generalized linear models with longitudinal data. Journal of Multivariate Analysis, 114, 63–73.

    Article  MathSciNet  Google Scholar 

  • Liang, K., Zeger, S. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.

    Article  MathSciNet  Google Scholar 

  • Little, R. J. A., Rubin, D. B. (2002). Statistical Analysis with Missing Data2nd ed. New York: Wiley.

    Book  Google Scholar 

  • Lv, J., Guo, C., Yang, H., Li, Y. (2017). A moving average Cholesky factor model in covariance modeling for composite quantile regression with longitudinal data. Computational Statistics and Data Analysis, 112, 129–144.

    Article  MathSciNet  Google Scholar 

  • Miao, W., Tchetgen Tchetgen, E. J. (2016). On varieties of doubly robust estimators under missingness not at random with a shadow variable. Biometrika, 103, 475–482.

    Article  MathSciNet  Google Scholar 

  • Molenberghs, G., Kenward, M. (2007). Missing Data in Clinical Studies. West Sussex: John Wiley and Sons.

    Book  Google Scholar 

  • Owen, A. (2001). Empirical Likelihood. Boca Raton, FL: Chapman and Hall/CRC Press.

    Book  Google Scholar 

  • Qin, J., Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22, 300–325.

    Article  MathSciNet  Google Scholar 

  • Qu, A., Lindsay, B. G., Li, B. (2000). Improving generalised estimating equations using quadratic inference functions. Biometrika, 87, 823–836.

    Article  MathSciNet  Google Scholar 

  • Rao, J. N. K., Scott, A. J. (1981). The analysis of categorical data from complex sample surveys: Chi-squared tests for goodness of fit and independence in two-way tables. Journal of the American statistical Association, 76, 221–230.

    Article  MathSciNet  Google Scholar 

  • Robins, J. M., Rotnitzky, A., Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89, 846–866.

    Article  MathSciNet  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of statistics, 6, 461–464.

    Article  MathSciNet  Google Scholar 

  • Shao, J., Wang, L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika, 103, 175–187.

    Article  MathSciNet  Google Scholar 

  • Tang, G., Little, R. J. A., Raghunathan, T. E. (2003). Analysis of multivariate missing data with nonignorable nonresponse. Biometrika, 90, 747–764.

    Article  MathSciNet  Google Scholar 

  • Wang, H., Li, B., Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 671–683.

    Article  MathSciNet  Google Scholar 

  • Wang, L., Qi, C., Shao, J. (2019). Model-assisted regression estimators for longitudinal data with nonignorable dropout. International Statistical Review, 87, S121–S138.

    Article  MathSciNet  Google Scholar 

  • Wang, S., Shao, J., Kim, J. K. (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica, 24, 1097–1116.

    MathSciNet  MATH  Google Scholar 

  • Xu, L., Tang, M. L., Chen, Z. (2019). Analysis of longitudinal data by combining multiple dynamic covariance models. Statistics and Its Interface, 12, 479–487.

    Article  MathSciNet  Google Scholar 

  • Xue, L., Zhu, L. (2007). Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika, 94, 921–937.

    Article  MathSciNet  Google Scholar 

  • You, J., Chen, G., Zhou, Y. (2006). Block empirical likelihood for longitudinal partially linear regression models. Canadian Journal of Statistics, 34, 79–96.

    Article  MathSciNet  Google Scholar 

  • Zahner, G. E., Pawelkiewicz, W., DeFrancesco, J. J., Adnopoz, J. (1992). Children’s mental health service needs and utilization patterns in an urban community: An epidemiological assessment. Journal of the American Academy of Child and Adolescent Psychiatry, 31, 951–960.

    Article  Google Scholar 

  • Zhang, W., Leng, C. (2011). A moving average Cholesky factor model in covariance modelling for longitudinal data. Biometrika, 99, 141–150.

    Article  MathSciNet  Google Scholar 

  • Zhang, W., Leng, C., Tang, C. Y. (2015). A joint modelling approach for longitudinal studies. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77, 219–238.

    Article  MathSciNet  Google Scholar 

  • Zhao, P., Tang, N., Jiang, D. (2017). Efficient inverse probability weighting method for quantile regression with nonignorable missing data. Statistics, 51, 363–386.

    Article  MathSciNet  Google Scholar 

  • Zhou, J., Qu, A. (2012). Informative estimation and selection of correlation structure for longitudinal data. Journal of the American statistical Association, 107, 701–710.

    Article  MathSciNet  Google Scholar 

  • Zou, H., Yuan, M. (2008). Composite quantile regression and the oracle model selection theory. The Annals of Statistics, 36, 1108–1126.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We are grateful to the Editor, the associate editor and two anonymous referees for their insightful comments and suggestions, which have led to significant improvements. Our research was supported by the National Natural Science Foundation of China (11871287, 11501208, 11771144, 11801359), the Natural Science Foundation of Tianjin (18JCYBJC41100), the Fundamental Research Funds for the Central Universities, the Key Laboratory for Medical Data Analysis and Statistical Research of Tianjin. The two authors contributed equally to this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material

The Supplementary Material contains proofs of the theorems and other technical material. (pdf 220KB)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, L., Ma, W. Improved empirical likelihood inference and variable selection for generalized linear models with longitudinal nonignorable dropouts. Ann Inst Stat Math 73, 623–647 (2021). https://doi.org/10.1007/s10463-020-00761-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-020-00761-4

Keywords

Navigation