当前位置: X-MOL 学术J. Stat. Comput. Simul. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian variable selection in clustering high-dimensional data via a mixture of finite mixtures
Journal of Statistical Computation and Simulation ( IF 1.1 ) Pub Date : 2021-03-30 , DOI: 10.1080/00949655.2021.1902526
Woojin Doo 1 , Heeyoung Kim 1
Affiliation  

When clustering high-dimensional data, it is often important to identify variables that discriminate the clusters. Meanwhile, a common issue in clustering is to determine the number of clusters. In this study, we propose a new method that simultaneously performs clustering and variable selection, while inferring the number of clusters from the data. We formulate the clustering problem using a finite mixture model with a symmetric Dirichlet weights prior, while also placing a prior on the number of components. That is, we utilize a mixture of finite mixtures. We handle the variable selection problem by introducing a latent binary vector, which represents the inclusion/exclusion of variables. We update the binary vector for variable selection using a Metropolis algorithm and perform inference on the cluster structure using a split–merge Markov chain Monte Carlo technique. We demonstrate the advantage of our method using simulated and two real DNA microarray datasets.



中文翻译:

通过有限混合的混合聚类高维数据中的贝叶斯变量选择

在对高维数据进行聚类时,识别区分聚类的变量通常很重要。同时,聚类中的一个常见问题是确定聚类的数量。在这项研究中,我们提出了一种同时执行聚类和变量选择的新方法,同时从数据中推断出聚类的数量。我们使用具有对称 Dirichlet 权重先验的有限混合模型来制定聚类问题,同时还对组件数量进行了先验。也就是说,我们使用有限混合的混合。我们通过引入一个潜在的二元向量来处理变量选择问题,它表示变量的包含/排除。我们使用 Metropolis 算法更新变量选择的二元向量,并使用分裂合并马尔可夫链蒙特卡罗技术对集群结构进行推理。我们使用模拟的和两个真实的 DNA 微阵列数据集证明了我们的方法的优势。

更新日期:2021-03-30
down
wechat
bug