当前位置: X-MOL 学术Ann. Inst. Stat. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Latent class analysis variable selection
Annals of the Institute of Statistical Mathematics ( IF 0.8 ) Pub Date : 2009-07-24 , DOI: 10.1007/s10463-009-0258-9
Nema Dean , Adrian E. Raftery

We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable’s usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNPs.

中文翻译:

潜在类分析变量选择

我们提出了一种在潜在类分析中选择变量的方法,这是最常见的基于模型的离散数据聚类方法。给定已经选择的聚类变量,该方法通过比较两个模型来评估变量对聚类的有用性。在一个模型中,该变量提供了超出已选择变量中包含的集群分配信息,而在另一个模型中则没有。使用一头搜索算法来探索模型空间并选择聚类变量。在模拟数据集中,我们发现该方法选择了正确的聚类变量,并且还提高了分类性能和类数选择的准确性。在两个真实的数据集中,我们的方法发现了具有较少变量的相同组结构。
更新日期:2009-07-24
down
wechat
bug