Consensus Monte Carlo for Random Subsets using Shared Anchors,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Consensus Monte Carlo for Random Subsets using Shared Anchors
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2020-04-15 , DOI: 10.1080/10618600.2020.1737085
Yang Ni ₁ , Yuan Ji ₂ , Peter Müller ₃

Affiliation

Abstract We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model, inference under an Indian buffet process prior with a binomial sampling model, and with a categorical sampling model. We assess the proposed algorithm with simulation studies and show results for inference with three datasets: an MNIST image dataset, a dataset of pancreatic cancer mutations, and a large set of electronic health records. Supplementary materials for this article are available online.

中文翻译：

使用共享锚点对随机子集进行共识蒙特卡罗

摘要：我们提出了一种共识蒙特卡罗算法，该算法可扩展现有的贝叶斯非参数模型，以进行大数据的聚类和特征分配。该算法对于任何先验的随机子集（例如分区和潜在特征分配），基本上在任何采样模型下都有效。在三个案例研究的推动下，我们重点关注由狄利克雷过程混合采样模型引起的聚类、使用二项式采样模型和分类采样模型在印度自助餐过程先验下的推理。我们通过模拟研究评估了所提出的算法，并展示了三个数据集的推理结果：MNIST 图像数据集、胰腺癌突变数据集和大量电子健康记录。本文的补充材料可在线获取。

更新日期：2020-04-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11