当前位置: X-MOL 学术Stat. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Subsampling sequential Monte Carlo for static Bayesian models
Statistics and Computing ( IF 2.2 ) Pub Date : 2020-09-09 , DOI: 10.1007/s11222-020-09969-z
David Gunawan , Khue-Dung Dang , Matias Quiroz , Robert Kohn , Minh-Ngoc Tran

We show how to speed up sequential Monte Carlo (SMC) for Bayesian inference in large data problems by data subsampling. SMC sequentially updates a cloud of particles through a sequence of distributions, beginning with a distribution that is easy to sample from such as the prior and ending with the posterior distribution. Each update of the particle cloud consists of three steps: reweighting, resampling, and moving. In the move step, each particle is moved using a Markov kernel; this is typically the most computationally expensive part, particularly when the dataset is large. It is crucial to have an efficient move step to ensure particle diversity. Our article makes two important contributions. First, in order to speed up the SMC computation, we use an approximately unbiased and efficient annealed likelihood estimator based on data subsampling. The subsampling approach is more memory efficient than the corresponding full data SMC, which is an advantage for parallel computation. Second, we use a Metropolis within Gibbs kernel with two conditional updates. A Hamiltonian Monte Carlo update makes distant moves for the model parameters, and a block pseudo-marginal proposal is used for the particles corresponding to the auxiliary variables for the data subsampling. We demonstrate both the usefulness and limitations of the methodology for estimating four generalized linear models and a generalized additive model with large datasets.



中文翻译:

静态贝叶斯模型的二次采样顺序蒙特卡洛

我们展示了如何通过数据二次采样来加速大数据问题中贝叶斯推理的顺序蒙特卡洛(SMC)。SMC通过一系列分布顺序更新粒子云,从一个易于采样的分布开始(例如先验分布)到后验分布结束。粒子云的每次更新都包含三个步骤:重新加权,重新采样和移动。在移动步骤中,使用Markov核移动每个粒子。这通常是计算上最昂贵的部分,尤其是在数据集很大时。采取有效的移动步骤以确保粒子多样性至关重要。我们的文章做出了两个重要贡献。首先,为了加快SMC计算,我们使用了基于数据二次采样的近似无偏且有效的退火似然估计器。与相应的完整数据SMC相比,二次采样方法的存储效率更高,这对于并行计算是一个优势。其次,我们在Gibbs内核中使用了Metropolis,并进行了两个条件更新。哈密​​顿量的蒙特卡洛更新使模型参数发生了远距离移动,并且将块伪边际建议用于与数据二次采样的辅助变量相对应的粒子。我们展示了用于估计四个广义线性模型和具有大型数据集的广义加法模型的方法的实用性和局限性。哈密​​顿量的蒙特卡洛更新使模型参数发生了远距离移动,并且将块伪边际建议用于与数据二次采样的辅助变量相对应的粒子。我们展示了用于估计四个广义线性模型和具有大型数据集的广义加法模型的方法的实用性和局限性。哈密​​顿量的蒙特卡洛更新使模型参数发生了远距离移动,并且将块伪边际建议用于与数据二次采样的辅助变量相对应的粒子。我们展示了用于估计四个广义线性模型和具有大型数据集的广义加法模型的方法的实用性和局限性。

更新日期:2020-09-10
down
wechat
bug