当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Search Algorithms and Loss Functions for Bayesian Clustering
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2022-05-26 , DOI: 10.1080/10618600.2022.2069779
David B. Dahl 1 , Devin J. Johnson 1 , Peter Müller 2
Affiliation  

Abstract

We propose a randomized greedy search algorithm to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. Given the large size and awkward discrete nature of the search space, the minimization of the posterior expected loss is challenging. Our approach is a stochastic search based on a series of greedy optimizations performed in a random order and is embarrassingly parallel. We consider several loss functions, including Binder loss and variation of information. We note that criticisms of Binder loss are the result of using equal penalties of misclassification and we show an efficient means to compute Binder loss with potentially unequal penalties. Furthermore, we extend the original variation of information to allow for unequal penalties and show no increased computational costs. We provide a reference implementation of our algorithm. Using a variety of examples, we show that our method produces clustering estimates that better minimize the expected loss and are obtained faster than existing methods. Supplementary materials for this article are available online.



中文翻译:

贝叶斯聚类的搜索算法和损失函数

摘要

我们提出了一种随机贪婪搜索算法,以基于损失函数和后验蒙特卡罗样本找到随机分区的点估计。鉴于搜索空间的大尺寸和尴尬的离散性质,后验预期损失的最小化具有挑战性。我们的方法是基于一系列以随机顺序执行的贪婪优化的随机搜索,并且是令人尴尬的并行。我们考虑了几个损失函数,包括 Binder 损失和信息变化。我们注意到,对 Binder 损失的批评是对错误分类使用相等惩罚的结果,我们展示了一种计算 Binder 损失的有效方法,惩罚可能不相等。此外,我们扩展了信息的原始变化以允许不平等的惩罚并且没有增加计算成本。我们提供了我们算法的参考实现。使用各种示例,我们表明我们的方法产生的聚类估计可以更好地最小化预期损失并且比现有方法更快地获得。本文的补充材料可在线获取。

更新日期:2022-05-26
down
wechat
bug