当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Block-Wise Variable Selection for Clustering Via Latent States of Mixture Models
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2021-11-17 , DOI: 10.1080/10618600.2021.1982724
Beomseok Seo 1 , Lin Lin 1 , Jia Li 1
Affiliation  

Abstract

Mixture modeling is a major paradigm for clustering in statistics. In this article, we develop a new block-wise variable selection method for clustering by exploiting the latent states of the hidden Markov model on variable blocks or the Gaussian mixture model. The variable blocks are formed by depth-first-search on a dendrogram created based on the mutual information between any pair of variables. It is demonstrated that the latent states of the variable blocks together with the mixture model parameters can represent the original data effectively and much more compactly. We thus cluster the data using the latent states and select variables according to the relationship between the states and the clusters. As true class labels are unknown in the unsupervised setting, we first generate more refined clusters, namely, semi-clusters, for variable selection and then determine the final clusters based on the dimension reduced data. Experiments on simulated and real data show that the new method is highly competitive in terms of clustering accuracy compared with several widely used methods. Supplementary materials for this article are available online.



中文翻译:

通过混合模型的潜在状态进行聚类的分块变量选择

摘要

混合建模是统计学中聚类的主要范例。在本文中,我们通过利用隐藏马尔可夫模型在变量块或高斯混合模型上的潜在状态,开发了一种用于聚类的新的块级变量选择方法。变量块是通过深度优先搜索基于任何一对变量之间的互信息创建的树状图而形成的。证明了变量块的潜在状态与混合模型参数可以更有效、更紧凑地表示原始数据。因此,我们使用潜在状态对数据进行聚类,并根据状态与聚类之间的关系选择变量。由于真正的类标签在无监督设置中是未知的,我们首先生成更精细的集群,即半集群,进行变量选择,然后根据降维数据确定最终的聚类。模拟和真实数据的实验表明,与几种广泛使用的方法相比,新方法在聚类精度方面具有很强的竞争力。本文的补充材料可在线获取。

更新日期:2021-11-17
down
wechat
bug