当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scalable Bayesian Nonparametric Clustering and Classification
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2019-07-19 , DOI: 10.1080/10618600.2019.1624366
Yang Ni 1, 2 , Peter Müller 3 , Maurice Diesendruck 2 , Sinead Williamson 4 , Yitan Zhu 5 , Yuan Ji 6
Affiliation  

Abstract We develop a scalable multistep Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is “embarrassingly parallel” and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach make inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating datasets: a large set of electronic health records and a bank telemarketing dataset. We find interesting clusters and competitive classification performance relative to other widely used competing classifiers. Supplementary materials for this article are available online.

中文翻译:

可扩展贝叶斯非参数聚类和分类

摘要 我们开发了一种可扩展的多步蒙特卡罗算法,用于在一大类非参数贝叶斯模型的聚类和分类下进行推理。每一步都是“令人尴尬的并行”,可以使用相同的马尔可夫链蒙特卡罗采样器来实现。我们的方法的简单性和通用性可以推断适用于大型数据集的各种贝叶斯非参数混合模型。具体来说,我们将这种方法应用于具有协变量回归的产品划分模型下的推理。我们展示了两个激励数据集的推理结果:大量电子健康记录和银行电话营销数据集。相对于其他广泛使用的竞争分类器,我们发现了有趣的集群和竞争分类性能。本文的补充材料可在线获取。
更新日期:2019-07-19
down
wechat
bug