当前位置: X-MOL 学术Comput. Stat. Data Anal. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bayesian nonparametric clustering as a community detection problem
Computational Statistics & Data Analysis ( IF 1.5 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.csda.2020.107044
Stefano F. Tonellato

It is well known that a wide class of bayesian nonparametric priors lead to the representation of the distribution of the observable variables as a mixture density with an infinite number of components, and that such a representation induces a clustering structure in the observations. However, cluster identification is not straightforward a posteriori and some post-processing is usually required. In order to circumvent label switching, pairwise posterior similarity has been introduced, and it has been used in order to either apply classical clustering algorithms or estimate the underlying partition by minimising a suitable loss function. This paper proposes to map observations on a weighted undirected graph, where each node represents a sample item and edge weights are given by the posterior pairwise similarities. It will be shown how, after building a particular random walk on such a graph, it is possible to apply a community detection algorithm, known as map equation method, by optimising the description length of the partition. A relevant feature of this method is that it allows for both the quantification of the posterior uncertainty of the classification and the selection of variables to be used for classification purposes.

中文翻译:

贝叶斯非参数聚类作为社区检测问题

众所周知,一大类贝叶斯非参数先验导致将可观察变量的分布表示为具有无限数量分量的混合密度,并且这种表示会在观察中引入聚类结构。然而,聚类识别不是直接的后验,通常需要一些后处理。为了规避标签切换,引入了成对后验相似性,它已被用于应用经典聚类算法或通过最小化合适的损失函数来估计基础分区。本文提出在加权无向图上映射观察,其中每个节点代表一个样本项,边权重由后验成对相似性给出。它将展示如何,在这样的图上建立特定的随机游走后,可以通过优化分区的描述长度来应用社区检测算法,称为地图方程方法。这种方法的一个相关特征是它允许量化分类的后验不确定性和选择用于分类目的的变量。
更新日期:2020-12-01
down
wechat
bug