A comparison of methods for clustering longitudinal data with slowly changing trends,Communications in Statistics - Simulation and Computation

当前位置： X-MOL 学术 › Commun. Stat. Simul. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A comparison of methods for clustering longitudinal data with slowly changing trends
Communications in Statistics - Simulation and Computation ( IF 0.9 ) Pub Date : 2021-01-19 , DOI: 10.1080/03610918.2020.1861464
N. G. P. Den Teuling _{1,

2} , S. C. Pauws _{2,

3} , E. R. van den Heuvel ₁

Affiliation

Abstract

Longitudinal clustering provides a detailed yet comprehensible description of time profiles among subjects. With several approaches that are commonly used for this purpose, it remains unclear under which conditions a method is preferred over another method. We investigated the performance of five methods using Monte Carlo simulations on synthetic datasets, representing various scenarios involving polynomial time profiles. The performance was evaluated on two aspects: The agreement of the group assignment to the simulated reference, as measured by the split-join distance, and the trend estimation error, as measured by a weighted minimum of the mean squared error (WMMSE). Growth mixture modeling (GMM) was found to achieve the best overall performance, followed closely by a two-step approach using growth curve modeling and k-means (GCKM). Considering the model similarities between GMM and GCKM, the latter is preferred for large datasets for its computational efficiency. Longitudinal k-means (KML) and group-based trajectory modeling were found to have practically identical solutions in the case that the group trajectory model of the latter method is correctly specified. Both methods performed less than GMM and GCKM in most settings.

中文翻译：

具有缓慢变化趋势的纵向数据聚类方法的比较

摘要

纵向聚类提供了对受试者之间时间分布的详细而易于理解的描述。对于通常用于此目的的几种方法，尚不清楚在哪些条件下一种方法优于另一种方法。我们在合成数据集上使用蒙特卡罗模拟研究了五种方法的性能，代表了涉及多项式时间分布的各种场景。性能在两个方面进行了评估：组分配与模拟参考的一致性，通过拆分连接距离衡量，以及趋势估计误差，通过均方误差的加权最小值 (WMMSE) 衡量。发现增长混合模型 (GMM) 可实现最佳整体性能，紧随其后的是使用增长曲线模型和k的两步法-意味着（GCKM）。考虑到 GMM 和 GCKM 之间的模型相似性，后者因其计算效率而成为大型数据集的首选。在正确指定后一种方法的组轨迹模型的情况下，发现纵向k均值 (KML) 和基于组的轨迹建模具有几乎相同的解决方案。在大多数情况下，这两种方法的性能都低于 GMM 和 GCKM。

更新日期：2021-01-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>