Clustering multivariate data using factor analytic Bayesian mixtures with an unknown number of components,Statistics and Computing

当前位置： X-MOL 学术 › Stat. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Clustering multivariate data using factor analytic Bayesian mixtures with an unknown number of components
Statistics and Computing ( IF 1.6 ) Pub Date : 2019-08-27 , DOI: 10.1007/s11222-019-09891-z
Panagiotis Papastamoulis

Recent work on overfitting Bayesian mixtures of distributions offers a powerful framework for clustering multivariate data using a latent Gaussian model which resembles the factor analysis model. The flexibility provided by overfitting mixture models yields a simple and efficient way in order to estimate the unknown number of clusters and model parameters by Markov chain Monte Carlo sampling. The present study extends this approach by considering a set of eight parameterizations, giving rise to parsimonious representations of the covariance matrix per cluster. A Gibbs sampler combined with a prior parallel tempering scheme is implemented in order to approximately sample from the posterior distribution of the overfitting mixture. The parameterization and number of factors are selected according to the Bayesian information criterion. Identifiability issues related to label switching are dealt by post-processing the simulated output with the Equivalence Classes Representatives algorithm. The contributed method and software are demonstrated and compared to similar models estimated using the expectation–maximization algorithm on simulated and real datasets. The software is available online at https://CRAN.R-project.org/package=fabMix.

中文翻译：

使用成分数未知的因子分析贝叶斯混合物对多元数据进行聚类

关于过度拟合贝叶斯分布混合的最新工作为使用类似于因子分析模型的潜在高斯模型聚类多元数据提供了强大的框架。通过过度拟合混合模型提供的灵活性产生了一种简单有效的方法，以便通过马尔可夫链蒙特卡洛采样来估计未知数目的聚类和模型参数。本研究通过考虑一组八个参数化来扩展此方法，从而产生了每个聚类的协方差矩阵的简约表示。吉布斯采样器结合了先前的平行回火方案，以便从过拟合混合物的后分布中近似采样。根据贝叶斯信息准则选择参数化和因子数量。通过使用等价类代表算法对模拟输出进行后处理，可以解决与标签切换有关的可识别性问题。演示了所贡献的方法和软件，并将其与在模拟和真实数据集上使用期望最大化算法估算的相似模型进行了比较。该软件可从https://CRAN.R-project.org/package=fabMix在线获得。

更新日期：2019-08-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11