Skip to main content
Log in

Penalized generalized estimating equations approach to longitudinal data with multinomial responses

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

In high-dimensional longitudinal data with multinomial response, the number of covariates is always much larger than the number of subjects and when modelling such data, variable selection is always an important issue. In this study, we developed the penalized generalized estimating equation for multinomial responses for identifying important variables and estimation of their regression coefficients simultaneously. An iterative algorithm is used to solve the penalized estimating equation by combining the Fisher-scoring algorithm and minorization-maximization algorithm. We used a penalty term to regularize the slope part only because category-specific intercept terms should be included in the multinomial model. We conducted a simulation study to investigate the performance of the proposed method and demonstrated its performance using real dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

An example dataset is available at http://statgen.snu.ac.kr/software/PGEE_M/.

Code availability

An implementation of PGEE_M is available at http://statgen.snu.ac.kr/software/PGEE_M/.

References

  • Annamalay, S. D. (2018). Effects of anti-oxidants on oxidative stress: Assessing MDA in urine samples. International Journal of Clinical Nutrition & Dietetics, 4, 1–7.

    Article  Google Scholar 

  • Breiman, L. (1996). Heuristics of instability and stabilization in model selection. The Annals of Statistics, 24(6), 2350–2383.

    Article  MathSciNet  Google Scholar 

  • Brouns, F., Bjorck, I., Frayn, K. N., Gibbs, A. L., Lang, V., Slama, G., & Wolever, T. M. S. (2005). Glycaemic index methodology. Nutrition Research Reviews, 18(1), 145–171.

    Article  Google Scholar 

  • Cario, M. C., & Nelson, B. L. (1997). Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix (pp. 1–19). Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University

  • Chen, K., & Kolls, J. K. (2017). Interluekin-17a (il17a). Gene, 614, 8–14.

    Article  Google Scholar 

  • Cooper, A. M., & Khader, S. A. (2007). IL-12p40: An inherently agonistic cytokine. Trends in Immunology, 28(1), 33–38.

    Article  Google Scholar 

  • Duffy, A. M., Bouchier-Hayes, D. J., & Harmey, J. H. (2013). Vascular endothelial growth factor (VEGF) and its role in non-endothelial cells: autocrine signalling by VEGF. In Madame Curie Bioscience Database [Internet]. Landes Bioscience.

  • Dziak, J. J. (2006). Penalized quadratic inference functions for variable selection in longitudinal research. Ph.D. Thesis, the Pennsylvania State University

  • Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models. Springer, New York, 2nd edition.

    Book  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Article  MathSciNet  Google Scholar 

  • Fan, J., & Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model. Annals of Statistics, 74–99.

  • Fan, J., & Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association, 99(467), 710–723.

    Article  MathSciNet  Google Scholar 

  • Gauglitz, G. G., Finnerty, C. C., Herndon, D. N., Mlcak, R. P., & Jeschke, M. G. (2008). Are serum cytokines early predictors for the outcome of burn patients with inhalation injuries who do not survive? Critical Care, 12(3), 1–8.

    Article  Google Scholar 

  • Hand, D. J., & Till, R. J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2), 171–186.

    Article  Google Scholar 

  • Howes, A., Gabryšová, L., & O'Garra, A. (2014). Role of IL-10 and the IL-10 receptor in immune responses. Reference Module in Biomedical Science, Elsevier.

  • Hunter, D. R., & Li, R. (2005). Variable selection using MM algorithms. Annals of Statistics, 33(4), 1617.

    Article  MathSciNet  Google Scholar 

  • Lee, S., Kwon, S., & Kim, Y. (2016). A modified local quadratic approximation algorithm for penalized optimization problems. Computational Statistics & Data Analysis, 94, 275–286.

    Article  MathSciNet  Google Scholar 

  • Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.

    Article  MathSciNet  Google Scholar 

  • Lipsitz, S. R., Kim, K., & Zhao, L. (1994). Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine, 13(11), 1149–1163.

    Article  Google Scholar 

  • Ni, X., Zhang, D., & Zhang, H. H. (2010). Variable selection for semiparametric mixed models in longitudinal studies. Biometrics, 66(1), 79–88.

    Article  MathSciNet  Google Scholar 

  • Shiomi, A., Usui, T., & Mimori, T. (2016). GM-CSF as a therapeutic target in autoimmune diseases. Inflammation and Regeneration, 36(1), 1–9.

    Article  Google Scholar 

  • Tanaka, T., Narazaki, M., & Kishimoto, T. (2014). IL-6 in inflammation, immunity, and disease. Cold Spring Harbor Perspectives in Biology, 6(10), 16295.

    Article  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (methodological), 58(1), 267–288.

    MathSciNet  MATH  Google Scholar 

  • Touloumis, A. (2016). Simulating correlated binary and multinomial responses under marginal model specification: The SimCorMultRes package. R J., 8(2), 79.

    Article  Google Scholar 

  • Touloumis, A., Agresti, A., & Kateri, M. (2013). GEE for multinomial responses using a local odds ratios parameterization. Biometrics, 69(3), 633–640.

    Article  MathSciNet  Google Scholar 

  • Wang, L. (2011). GEE analysis of clustered binary data with diverging number of covariates. The Annals of Statistics, 39(1), 389–417.

    Article  MathSciNet  Google Scholar 

  • Wang, L., Li, H., & Huang, J. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103(484), 1556–1569.

    Article  MathSciNet  Google Scholar 

  • Wang, L., Zhou, J., & Qu, A. (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics, 68(2), 353–360.

    Article  MathSciNet  Google Scholar 

  • Yuille, A. L., Rangarajan, A., & Yuille, A. L. (2002). The concave-convex procedure (CCCP). Advances in Neural Information Processing Systems, 2, 1033–1040.

    MATH  Google Scholar 

  • Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors like to thank Apio Catherine for English editing.

Funding

This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (Grant number: HI16C2037), and the Bio & Medical Technology Development Program of the National Research Foundation of Korea (NRF) (Grant number: 2013M3A9C4078158).

Author information

Authors and Affiliations

Authors

Contributions

TP developed the idea and the method, MK performed all analysis. MK and TP draft the manuscript. TP revised the manuscript. OK provided the real data.

Corresponding author

Correspondence to Taesung Park.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethics approval

The protocol was approved by the Institutional Review Board of Ewha Womans University (No 61–12). It is also registered in the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP, No. KCT0001241).

Consent to participate

All participants have provided consent.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kamruzzaman, M., Kwon, O. & Park, T. Penalized generalized estimating equations approach to longitudinal data with multinomial responses. J. Korean Stat. Soc. 50, 844–859 (2021). https://doi.org/10.1007/s42952-021-00134-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-021-00134-4

Keywords

Navigation