Penalized generalized estimating equations approach to longitudinal data with multinomial responses

Kamruzzaman, Md.; Kwon, Oran; Park, Taesung

doi:10.1007/s42952-021-00134-4

Penalized generalized estimating equations approach to longitudinal data with multinomial responses

Research Article
Published: 20 July 2021

Volume 50, pages 844–859, (2021)
Cite this article

Journal of the Korean Statistical Society Aims and scope Submit manuscript

448 Accesses
Explore all metrics

Abstract

In high-dimensional longitudinal data with multinomial response, the number of covariates is always much larger than the number of subjects and when modelling such data, variable selection is always an important issue. In this study, we developed the penalized generalized estimating equation for multinomial responses for identifying important variables and estimation of their regression coefficients simultaneously. An iterative algorithm is used to solve the penalized estimating equation by combining the Fisher-scoring algorithm and minorization-maximization algorithm. We used a penalty term to regularize the slope part only because category-specific intercept terms should be included in the multinomial model. We conducted a simulation study to investigate the performance of the proposed method and demonstrated its performance using real dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Article 15 July 2015

Cheng-Hsien Li

Identifying typical trajectories in longitudinal data: modelling strategies and interpretations

Article Open access 05 March 2020

Moritz Herle, Nadia Micali, … Bianca L. De Stavola

Handling Missing Data in Principal Component Analysis Using Multiple Imputation

Data availability

An example dataset is available at http://statgen.snu.ac.kr/software/PGEE_M/.

Code availability

An implementation of PGEE_M is available at http://statgen.snu.ac.kr/software/PGEE_M/.

References

Annamalay, S. D. (2018). Effects of anti-oxidants on oxidative stress: Assessing MDA in urine samples. International Journal of Clinical Nutrition & Dietetics, 4, 1–7.
Article Google Scholar
Breiman, L. (1996). Heuristics of instability and stabilization in model selection. The Annals of Statistics, 24(6), 2350–2383.
Article MathSciNet Google Scholar
Brouns, F., Bjorck, I., Frayn, K. N., Gibbs, A. L., Lang, V., Slama, G., & Wolever, T. M. S. (2005). Glycaemic index methodology. Nutrition Research Reviews, 18(1), 145–171.
Article Google Scholar
Cario, M. C., & Nelson, B. L. (1997). Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix (pp. 1–19). Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University
Chen, K., & Kolls, J. K. (2017). Interluekin-17a (il17a). Gene, 614, 8–14.
Article Google Scholar
Cooper, A. M., & Khader, S. A. (2007). IL-12p40: An inherently agonistic cytokine. Trends in Immunology, 28(1), 33–38.
Article Google Scholar
Duffy, A. M., Bouchier-Hayes, D. J., & Harmey, J. H. (2013). Vascular endothelial growth factor (VEGF) and its role in non-endothelial cells: autocrine signalling by VEGF. In Madame Curie Bioscience Database [Internet]. Landes Bioscience.
Dziak, J. J. (2006). Penalized quadratic inference functions for variable selection in longitudinal research. Ph.D. Thesis, the Pennsylvania State University
Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models. Springer, New York, 2nd edition.
Book Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Article MathSciNet Google Scholar
Fan, J., & Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model. Annals of Statistics, 74–99.
Fan, J., & Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association, 99(467), 710–723.
Article MathSciNet Google Scholar
Gauglitz, G. G., Finnerty, C. C., Herndon, D. N., Mlcak, R. P., & Jeschke, M. G. (2008). Are serum cytokines early predictors for the outcome of burn patients with inhalation injuries who do not survive? Critical Care, 12(3), 1–8.
Article Google Scholar
Hand, D. J., & Till, R. J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2), 171–186.
Article Google Scholar
Howes, A., Gabryšová, L., & O'Garra, A. (2014). Role of IL-10 and the IL-10 receptor in immune responses. Reference Module in Biomedical Science, Elsevier.
Hunter, D. R., & Li, R. (2005). Variable selection using MM algorithms. Annals of Statistics, 33(4), 1617.
Article MathSciNet Google Scholar
Lee, S., Kwon, S., & Kim, Y. (2016). A modified local quadratic approximation algorithm for penalized optimization problems. Computational Statistics & Data Analysis, 94, 275–286.
Article MathSciNet Google Scholar
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.
Article MathSciNet Google Scholar
Lipsitz, S. R., Kim, K., & Zhao, L. (1994). Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine, 13(11), 1149–1163.
Article Google Scholar
Ni, X., Zhang, D., & Zhang, H. H. (2010). Variable selection for semiparametric mixed models in longitudinal studies. Biometrics, 66(1), 79–88.
Article MathSciNet Google Scholar
Shiomi, A., Usui, T., & Mimori, T. (2016). GM-CSF as a therapeutic target in autoimmune diseases. Inflammation and Regeneration, 36(1), 1–9.
Article Google Scholar
Tanaka, T., Narazaki, M., & Kishimoto, T. (2014). IL-6 in inflammation, immunity, and disease. Cold Spring Harbor Perspectives in Biology, 6(10), 16295.
Article Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (methodological), 58(1), 267–288.
MathSciNet MATH Google Scholar
Touloumis, A. (2016). Simulating correlated binary and multinomial responses under marginal model specification: The SimCorMultRes package. R J., 8(2), 79.
Article Google Scholar
Touloumis, A., Agresti, A., & Kateri, M. (2013). GEE for multinomial responses using a local odds ratios parameterization. Biometrics, 69(3), 633–640.
Article MathSciNet Google Scholar
Wang, L. (2011). GEE analysis of clustered binary data with diverging number of covariates. The Annals of Statistics, 39(1), 389–417.
Article MathSciNet Google Scholar
Wang, L., Li, H., & Huang, J. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103(484), 1556–1569.
Article MathSciNet Google Scholar
Wang, L., Zhou, J., & Qu, A. (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics, 68(2), 353–360.
Article MathSciNet Google Scholar
Yuille, A. L., Rangarajan, A., & Yuille, A. L. (2002). The concave-convex procedure (CCCP). Advances in Neural Information Processing Systems, 2, 1033–1040.
MATH Google Scholar
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942.
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors like to thank Apio Catherine for English editing.

Funding

This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (Grant number: HI16C2037), and the Bio & Medical Technology Development Program of the National Research Foundation of Korea (NRF) (Grant number: 2013M3A9C4078158).

Author information

Authors and Affiliations

Department of Statistics, Seoul National University, Seoul, 08826, Republic of Korea
Md. Kamruzzaman & Taesung Park
Department of Nutritional Science and Food Management, Ewha Womans University, Seoul, 120-750, Republic of Korea
Oran Kwon
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
Taesung Park

Authors

Md. Kamruzzaman
View author publications
You can also search for this author in PubMed Google Scholar
Oran Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Taesung Park
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

TP developed the idea and the method, MK performed all analysis. MK and TP draft the manuscript. TP revised the manuscript. OK provided the real data.

Corresponding author

Correspondence to Taesung Park.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethics approval

The protocol was approved by the Institutional Review Board of Ewha Womans University (No 61–12). It is also registered in the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP, No. KCT0001241).

Consent to participate

All participants have provided consent.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kamruzzaman, M., Kwon, O. & Park, T. Penalized generalized estimating equations approach to longitudinal data with multinomial responses. J. Korean Stat. Soc. 50, 844–859 (2021). https://doi.org/10.1007/s42952-021-00134-4

Download citation

Received: 24 February 2021
Accepted: 05 June 2021
Published: 20 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s42952-021-00134-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Penalized generalized estimating equations approach to longitudinal data with multinomial responses

Abstract

Access this article

Similar content being viewed by others

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Identifying typical trajectories in longitudinal data: modelling strategies and interpretations

Handling Missing Data in Principal Component Analysis Using Multiple Imputation

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Penalized generalized estimating equations approach to longitudinal data with multinomial responses

Abstract

Access this article

Similar content being viewed by others

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Identifying typical trajectories in longitudinal data: modelling strategies and interpretations

Handling Missing Data in Principal Component Analysis Using Multiple Imputation

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation