Abstract
Allocating computation over multiple chains to reduce sampling time in MCMC is crucial in making MCMC more applicable in the state of the art models such as deep neural networks. One of the parallelization schemes for MCMC is partitioning the sample space to run different MCMC chains in each component of the partition (VanDerwerken and Schmidler in Parallel Markov chain Monte Carlo. arXiv:1312.7479, 2013; Basse et al. in Artificial intelligence and statistics, pp 1318–1327, 2016). In this work, we take Basse et al. (2016)’s bridge sampling approach and apply constrained Hamiltonian Monte Carlo on partitioned sample spaces. We propose a random dimension partition scheme that combines well with the constrained HMC. We empirically show that this approach can expedite MCMC sampling for any unnormalized target distribution such as Bayesian neural network in a high dimensional setting. Furthermore, in the presence of multi-modality, this algorithm is expected to be more efficient in mixing MCMC chains when proper partition elements are chosen.
Similar content being viewed by others
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., & Isard, M. et al. (2016). Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI). Savannah, Georgia, USA.
Alder, B. J., & Wainwright, T. E. (1959). Studies in molecular dynamics. i. General method. The Journal of Chemical Physics, 31, 459–466.
Basse, G., Smith, A., & Pillai, N. (2016). Parallel Markov chain Monte Carlo via spectral clustering. In Artificial intelligence and statistics (pp. 1318–1327), PMLR.
Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association.
Bradford, R., & Thomas, A. (1996). Markov chain monte carlo methods for family trees using a parallel processor. Statistics and Computing, 6, 67–75.
Brockwell, A. E. (2006). Parallel Markov chain Monte Carlo simulation by pre-fetching. Journal of Computational and Graphical Statistics, 15, 246–261.
Byrd, J. M., Jarvis, S. A., & Bhalerao, A. H. (2008). Reducing the run-time of mcmc programs by multithreading on smp architectures. In IEEE international symposium on parallel and distributed processing (pp. 1–8). IPDPS 2008.
Choo, K. (2000). Learning hyperparameters for neural network models using Hamiltonian dynamics. Ph.D. thesis Citeseer.
Foreman-Mackey, D., Hogg, D. W., Lang, D., & Goodman, J. (2013). emcee: The mcmc hammer. Publications of the Astronomical Society of the Pacific, 125, 306.
Gelman, A., & Meng, X.-L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, 13(2), 163–185.
Glynn, P. W., & Heidelberger, P. (1992). Analysis of initial transient deletion for parallel steady-state simulations. SIAM Journal on Scientific and Statistical Computing, 13, 904–922.
Goodman, J., & Weare, J. (2010). Ensemble samplers with affine invariance. Communications in Applied Mathematics and Computational Science, 5, 65–80.
Gronau, Q. F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., Leslie, D. S., Forster, J. J., Wagenmakers, E.-J., & Steingroever, H. (2017). A tutorial on bridge sampling. arXiv preprint arXiv:1703.05984.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Liu, J. S. (2008). Monte Carlo strategies in scientific computing. Berlin: Springer.
Neal, R. M. (2012). Bayesian learning for neural networks (Vol. 118). Berlin: Springer.
Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo, (pp. 113–162). Chapman and Hall/CRC.
Nishihara, R., Murray, I., & Adams, R. P. (2014). Parallel MCMC with generalized elliptical slice sampling. Journal of Machine Learning Research, 15, 2087–2112.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407.
Robert, C. P. (2004). Monte Carlo methods. Hoboken: Wiley.
Rosenthal, J. S. (2000). Parallel computing and monte carlo algorithms. Far East Journal of Theoretical Statistics, 4, 207–236.
Swendsen, R. H., & Wang, J.-S. (1986). Replica monte carlo simulation of spin-glasses. Physical Review Letters, 57, 2607.
VanDerwerken, D. N., & Schmidler, S. C. (2013). Parallel markov chain monte carlo. arXiv preprint arXiv:1312.7479.
Wilkinson, D. J. (2006). Parallel bayesian computation. Statistics Textbooks and Monographs, 184, 477.
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korea government (MSIT) (No. 2018R1A2A3074973).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kim, M., Lee, J. Hamiltonian Markov chain Monte Carlo for partitioned sample spaces with application to Bayesian deep neural nets. J. Korean Stat. Soc. 49, 139–160 (2020). https://doi.org/10.1007/s42952-019-00001-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-019-00001-3