当前位置: X-MOL 学术Comput. Stat. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generalized k-means in GLMs with applications to the outbreak of COVID-19 in the United States
Computational Statistics & Data Analysis ( IF 1.8 ) Pub Date : 2021-03-10 , DOI: 10.1016/j.csda.2021.107217
Tonglin Zhang 1 , Ge Lin 2
Affiliation  

Generalized k-means can be combined with any similarity or dissimilarity measure for clustering. Using the well known likelihood ratio or F-statistic as the dissimilarity measure, a generalized k-means method is proposed to group generalized linear models (GLMs) for exponential family distributions. Given the number of clusters k, the proposed method is established by the uniform most powerful unbiased (UMPU) test statistic for the comparison between GLMs. If k is unknown, then the proposed method can be combined with generalized liformation criterion (GIC) to automatically select the best k for clustering. Both AIC and BIC are investigated as special cases of GIC. Theoretical and simulation results show that the number of clusters can be correctly identified by BIC but not AIC. The proposed method is applied to the state-level daily COVID-19 data in the United States, and it identifies 6 clusters. A further study shows that the models between clusters are significantly different from each other, which confirms the result with 6 clusters.



中文翻译:

GLM 中的广义 k 均值及其在美国 COVID-19 爆发中的应用

广义的k-均值可以与任何相似性或相异性度量相结合以进行聚类。使用众所周知的似然比或F-统计作为相异性度量,一个广义的k-means 方法被提出来对指数族分布的广义线性模型 (GLM) 进行分组。给定簇数k, 所提出的方法是通过统一最强大的无偏 (UMPU) 测试统计建立的, 用于 GLM 之间的比较。如果k是未知的,那么所提出的方法可以结合广义信息化标准(GIC)来自动选择最佳k用于聚类。AIC和BIC都作为GIC的特例进行研究。理论和仿真结果表明,BIC可以正确识别簇数,而AIC不能。将所提出的方法应用于美国州级每日 COVID-19 数据,并识别出 6 个聚类。进一步的研究表明,集群之间的模型彼此之间存在显着差异,这证实了 6 个集群的结果。

更新日期:2021-03-15
down
wechat
bug