Abstract
In high-dimensional longitudinal data with multinomial response, the number of covariates is always much larger than the number of subjects and when modelling such data, variable selection is always an important issue. In this study, we developed the penalized generalized estimating equation for multinomial responses for identifying important variables and estimation of their regression coefficients simultaneously. An iterative algorithm is used to solve the penalized estimating equation by combining the Fisher-scoring algorithm and minorization-maximization algorithm. We used a penalty term to regularize the slope part only because category-specific intercept terms should be included in the multinomial model. We conducted a simulation study to investigate the performance of the proposed method and demonstrated its performance using real dataset.
Similar content being viewed by others
Data availability
An example dataset is available at http://statgen.snu.ac.kr/software/PGEE_M/.
Code availability
An implementation of PGEE_M is available at http://statgen.snu.ac.kr/software/PGEE_M/.
References
Annamalay, S. D. (2018). Effects of anti-oxidants on oxidative stress: Assessing MDA in urine samples. International Journal of Clinical Nutrition & Dietetics, 4, 1–7.
Breiman, L. (1996). Heuristics of instability and stabilization in model selection. The Annals of Statistics, 24(6), 2350–2383.
Brouns, F., Bjorck, I., Frayn, K. N., Gibbs, A. L., Lang, V., Slama, G., & Wolever, T. M. S. (2005). Glycaemic index methodology. Nutrition Research Reviews, 18(1), 145–171.
Cario, M. C., & Nelson, B. L. (1997). Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix (pp. 1–19). Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University
Chen, K., & Kolls, J. K. (2017). Interluekin-17a (il17a). Gene, 614, 8–14.
Cooper, A. M., & Khader, S. A. (2007). IL-12p40: An inherently agonistic cytokine. Trends in Immunology, 28(1), 33–38.
Duffy, A. M., Bouchier-Hayes, D. J., & Harmey, J. H. (2013). Vascular endothelial growth factor (VEGF) and its role in non-endothelial cells: autocrine signalling by VEGF. In Madame Curie Bioscience Database [Internet]. Landes Bioscience.
Dziak, J. J. (2006). Penalized quadratic inference functions for variable selection in longitudinal research. Ph.D. Thesis, the Pennsylvania State University
Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models. Springer, New York, 2nd edition.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Fan, J., & Li, R. (2002). Variable selection for Cox's proportional hazards model and frailty model. Annals of Statistics, 74–99.
Fan, J., & Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association, 99(467), 710–723.
Gauglitz, G. G., Finnerty, C. C., Herndon, D. N., Mlcak, R. P., & Jeschke, M. G. (2008). Are serum cytokines early predictors for the outcome of burn patients with inhalation injuries who do not survive? Critical Care, 12(3), 1–8.
Hand, D. J., & Till, R. J. (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning, 45(2), 171–186.
Howes, A., Gabryšová, L., & O'Garra, A. (2014). Role of IL-10 and the IL-10 receptor in immune responses. Reference Module in Biomedical Science, Elsevier.
Hunter, D. R., & Li, R. (2005). Variable selection using MM algorithms. Annals of Statistics, 33(4), 1617.
Lee, S., Kwon, S., & Kim, Y. (2016). A modified local quadratic approximation algorithm for penalized optimization problems. Computational Statistics & Data Analysis, 94, 275–286.
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.
Lipsitz, S. R., Kim, K., & Zhao, L. (1994). Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine, 13(11), 1149–1163.
Ni, X., Zhang, D., & Zhang, H. H. (2010). Variable selection for semiparametric mixed models in longitudinal studies. Biometrics, 66(1), 79–88.
Shiomi, A., Usui, T., & Mimori, T. (2016). GM-CSF as a therapeutic target in autoimmune diseases. Inflammation and Regeneration, 36(1), 1–9.
Tanaka, T., Narazaki, M., & Kishimoto, T. (2014). IL-6 in inflammation, immunity, and disease. Cold Spring Harbor Perspectives in Biology, 6(10), 16295.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (methodological), 58(1), 267–288.
Touloumis, A. (2016). Simulating correlated binary and multinomial responses under marginal model specification: The SimCorMultRes package. R J., 8(2), 79.
Touloumis, A., Agresti, A., & Kateri, M. (2013). GEE for multinomial responses using a local odds ratios parameterization. Biometrics, 69(3), 633–640.
Wang, L. (2011). GEE analysis of clustered binary data with diverging number of covariates. The Annals of Statistics, 39(1), 389–417.
Wang, L., Li, H., & Huang, J. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103(484), 1556–1569.
Wang, L., Zhou, J., & Qu, A. (2012). Penalized generalized estimating equations for high-dimensional longitudinal data analysis. Biometrics, 68(2), 353–360.
Yuille, A. L., Rangarajan, A., & Yuille, A. L. (2002). The concave-convex procedure (CCCP). Advances in Neural Information Processing Systems, 2, 1033–1040.
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942.
Acknowledgements
The authors like to thank Apio Catherine for English editing.
Funding
This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (Grant number: HI16C2037), and the Bio & Medical Technology Development Program of the National Research Foundation of Korea (NRF) (Grant number: 2013M3A9C4078158).
Author information
Authors and Affiliations
Contributions
TP developed the idea and the method, MK performed all analysis. MK and TP draft the manuscript. TP revised the manuscript. OK provided the real data.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Ethics approval
The protocol was approved by the Institutional Review Board of Ewha Womans University (No 61–12). It is also registered in the World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP, No. KCT0001241).
Consent to participate
All participants have provided consent.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kamruzzaman, M., Kwon, O. & Park, T. Penalized generalized estimating equations approach to longitudinal data with multinomial responses. J. Korean Stat. Soc. 50, 844–859 (2021). https://doi.org/10.1007/s42952-021-00134-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-021-00134-4