当前位置: X-MOL 学术Stat. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations.
Statistics and Computing ( IF 1.6 ) Pub Date : 2014-05-03 , DOI: 10.1007/s11222-014-9471-3
David I Hastie 1 , Silvia Liverani 2 , Sylvia Richardson 3
Affiliation  

We consider the question of Markov chain Monte Carlo sampling from a general stick-breaking Dirichlet process mixture model, with concentration parameter \(\alpha \). This paper introduces a Gibbs sampling algorithm that combines the slice sampling approach of Walker (Communications in Statistics - Simulation and Computation 36:45–54, 2007) and the retrospective sampling approach of Papaspiliopoulos and Roberts (Biometrika 95(1):169–186, 2008). Our general algorithm is implemented as efficient open source C++ software, available as an R package, and is based on a blocking strategy similar to that suggested by Papaspiliopoulos (A note on posterior sampling from Dirichlet mixture models, 2008) and implemented by Yau et al. (Journal of the Royal Statistical Society, Series B (Statistical Methodology) 73:37–57, 2011). We discuss the difficulties of achieving good mixing in MCMC samplers of this nature in large data sets and investigate sensitivity to initialisation. We additionally consider the challenges when an additional layer of hierarchy is added such that joint inference is to be made on \(\alpha \). We introduce a new label-switching move and compute the marginal partition posterior to help to surmount these difficulties. Our work is illustrated using a profile regression (Molitor et al. Biostatistics 11(3):484–498, 2010) application, where we demonstrate good mixing behaviour for both synthetic and real examples.

中文翻译:

从具有未知浓度参数的 Dirichlet 过程混合模型中采样:大数据实现中的混合问题。

我们从一般的断棒狄利克雷过程混合模型中考虑马尔可夫链蒙特卡罗抽样问题,浓度参数\(\alpha \). 本文介绍了一种 Gibbs 采样算法,该算法结合了 Walker 的切片采样方法(Communications in Statistics - Simulation and Computation 36:45–54, 2007)和 Papaspiliopoulos 和 Roberts 的回顾性采样方法(Biometrika 95(1):169–186 , 2008)。我们的通用算法被实现为高效的开源 C++ 软件,以 R 包的形式提供,并且基于类似于 Papaspiliopoulos 建议的阻塞策略(关于 Dirichlet 混合模型的后验采样的注释,2008)并由 Yau 等人实现. (皇家统计学会杂志,B 系列(统计方法)73:37-57,2011)。我们讨论了在大型数据集中在这种性质的 MCMC 采样器中实现良好混合的困难,并研究了对初始化的敏感性。\(\alpha \)。我们引入了一种新的标签切换移动并计算后验边际分区以帮助克服这些困难。我们的工作使用轮廓回归 (Molitor et al. Biostatistics 11(3):484–498, 2010) 应用程序进行说明,其中我们展示了合成示例和真实示例的良好混合行为。
更新日期:2014-05-03
down
wechat
bug