Abstract
In recent years, big datasets are often split into several subsets due to the storage requirements. We propose a parallel group Bayesian method for statistical inference in sparse big data. This method improves the existing methods in two aspects: the total datasets are also split into a data subset sequence and the parameter vector is divided into several sub-vectors. Besides, we add a weight sequence to optimize the sub-estimators when each of them has a different covariance matrix. We obtain several theoretical properties of the estimator. The results of numerical simulations show that our method is consistent with the theoretical results and is more effective than classic Markov chain Monte Carlo methods.
Similar content being viewed by others
References
Brockwell AE (2006) Parallel Markov chain Monte Carlo simulation by pre-fetching. J Comput Graph Stat 15(1):246–261
Corander J, Ekdahl M, Koski T (2008) Parallell interacting MCMC for learning of topologies of graphical models. Data Min Knowl Disc 17(3):431–456
Denwood MJ (2016) runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. J Stat Softw 71(9):1–25
Jiang W (2007) Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities. Ann Stat 35(4):1487–1511
Johndrow J, Orenstein P, Bhattacharya A (2017) Bayes shrinkage at GWAS scale: convergence and approximation theory of a scalable MCMC algorithm for the horseshoe prior. ArXiv:1705.00841
Jordan MI, Lee JD, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114(526):668–681
Lee JD, Liu Q, Sun Y, Taylor JE (2017) Communication-efficient sparse regression. J Mach Learn Res 18(5):1–30
Liang F, Song Q, Kai Y (2013) Bayesian subset modeling for high-dimensional generalized linear models. J Am Stat Assoc 108(502):589–606
Liu X, Huang M, Fan B, Buckler E, Zhang Z (2016) Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet 12(2):e1005767
Martino L, Elvira V, Luengo D, Louzada F (2016a) Parallel Metropolis chains with cooperative adaptation. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3974–3978
Martino L, Elvira V, Luengo D, Corander J, Louzada F (2016b) Orthogonal parallel MCMC methods for sampling and optimization. Digit Signal Proc 58:64–84
Miasojedow B, Moulines E, Vihola M (2013) An adaptive parallel tempering algorithm. J Comput Graph Stat 22(3):649–664
Nishihara R, Murray I, Adams RP (2014) Parallel MCMC with generalized elliptical slice sampling. J Mach Learn Res 15(1):2087–2112
Owen J, Wilkinson DJ, Gillespie CS (2015) Scalable inference for Markov processes with intractable likelihoods. Stat Comput 25(1):145–156
Quiroz M, Kohn R, Villani M, Tran MN (2019) Speeding up MCMC by efficient data subsampling. J Am Stat Assoc 114(526):831–843
Schäfer C, Chopin N (2013) Sequential Monte Carlo on large binary sampling spaces. Stat Comput 23:1–22
Song Q, Liang F (2015) A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression. J R Stat Soc B 77(5):947–972
Zeng P, Zhou X (2017) Non-parametric genetic prediction of complex traits with latent dirichlet process regression models. Nat Commun 8(1):1–11
Zhou Y, Johansen A, Aston J (2013) Toward automatic model comparison: an adaptive sequential Monte Carlo approach. J Comput Graph Stat 25(3):701–726
Wang C, Chen MH, Schifano E, Wu J, Yan J (2016) Statistical methods and computing for big data. Stat Interface 9(4):399–414
Acknowledgements
We thank a co-editor and three anonymous referees for their extremely valuable suggestions. This work was supported by a grant from Natural Science Foundation of Shandong under Project ID ZR2016AM09.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Technical proofs
Appendix: Technical proofs
In this section, we collect the technical proofs.
Proof of Theorem 1
For \(t=C_r\log n/\epsilon \), \(C_r\) is a large constant. Noting that \(e^x\le 1+x+\frac{1}{2}x^2e^{|x|}\) for all \(x>0\), we obtain
Here \(C_r>(r+\epsilon )\) for \(\epsilon >0\), and c is a suitable constant. We thus have the theorem. \(\square \)
Proof of Theorem 2
Note that \(\{w_{k}\}_{k=1}^{K_n}\) and \(\{\epsilon _{I_k}\}_{k=1}^{K_n}\) are independent, we then have, for \(r>0\),
Therefore, \(\sum _{k=1}^\infty w_{k}\epsilon _{I_k}\) converges.
Let \(s_{K_n}=\sum _{k=1}^{K_n}\epsilon _{I_k}\) for \(s_0=0\) and \(K_n\ge 1\), \(Y_{K_n}=(s_{K_n}/K_n^{1/r})\). Therefore \(\lim _{K_n\rightarrow \infty }Y_{K_n}=0\), and
We obtain
Let \(D_{I_k}=k^{1/r}(w_{k}-w_{k+1})\). Hence, \(\sum _{k=1}^\infty |D_{I_k}|< C_w\), and \(\lim _{k\rightarrow \infty }D_{I_k} =0\) for \(k,n\in \mathbb {N}^+\). For any \(\epsilon >0\), we choose \(N_D\) to satisfy \(\Vert Y_k\Vert _1<\epsilon \) for \(k\ge N_D\). We then have
Let\(\epsilon \rightarrow 0\),we have
Thus, for \(K_n=O(\sqrt{n})\),
\(\square \)
Proof of Theorem 3
We need three steps to build the proof.
-
i)
\(\pi \big [p_{I_k,M}: d_1(p_{I_k}^*,p_{I_k,M})\le \epsilon _{n_k}^2/4\big ]\ge \exp (-n_k\epsilon _{n_k}^2/4)\) for all sufficiently large \(n_k\) with \(n=\sum _{k=1}^{K_n} n_k\). For \(\Vert \theta _{I_k}^*-\theta _{I_k,M}\Vert _1\longrightarrow 0\), we have \(d_1(p_{I_k}^*,p_{I_k,M})=h(\theta _{I_k}^l)(\theta _{I_k}^*-\theta _{I_k,M})\). The derivative function h is continuous in some neighborhood of \(\theta _{I_k}^*\), \(\theta _{I_k}^l\) is between \(\theta _{I_k}^*\) and \(\theta _{I_k,M}\). Observe that, for \(k=1,\ldots ,K_n\),
$$\begin{aligned} \Vert \theta _{I_k}^*-\theta _{I_k}^l\Vert _1\le & {} \Vert \theta _{I_k}^*-\theta _{I_k,M}\Vert _1\le \sum _{g=1}^{G_n} \big \Vert X_{I_{k,g}}\beta _{I_{k,g}}^*- M_{I_g} X_{I_{k,g}}^M \beta _{I_{k,g}}^M\big \Vert _1 \\\le & {} \bigg \Vert \sum _{g=1}^{G_n}|M_{{\setminus } I_g} X_{I_{k,g}}\beta _{I_{k,g}}^*\bigg \Vert _1+\bigg \Vert \sum _{g=1}^{G_n} X_{I_{k,g}}(\beta _{I_{k,g}}^*- M_{I_g}\beta _{I_{k,g}}^M)\bigg \Vert _1 \\\le & {} C_\theta (\Delta _{n_k}+r_{n_k}\delta _{n_k}), \end{aligned}$$where \(\Delta _{n_k}= \sum _{g=1}^{G_n}|M_{ {\setminus } I_g} \Vert \beta _{I_{k,g}}^*\Vert _1\), and \(C_\theta \) is a constant. By (A1) and Theorem 2, note that
$$\begin{aligned} \Vert \theta _{I_k}^l\Vert _1\le \Vert \theta _{I_k}^*\Vert _1+\Vert \theta _{I_k}^*-\theta _{I_k}^l\Vert _1\le \lim _{n_k\rightarrow \infty }\sum _{g=1}^{G_n} \Vert \beta _{I_{k,g}}^*\Vert _1+\Delta _{n_k}+r_{n_k}\delta _{n_k}, \end{aligned}$$\(\Vert \theta _{I_k}^l\Vert _1\) is bounded as \(r_{n_k}\delta _{n_k}\longrightarrow 0\). Therefore, \(\Vert h(\theta _{I_k}^l)\Vert _1\) is bounded, and an existing constant \(C_h\) satisfies
$$\begin{aligned} d_1(p_{I_k}^*,p_{I_k,M})\le C_h(\Delta _{n_k}+r_{n_k}\delta _{n_k}). \end{aligned}$$Let \(\delta _{n_k}=c_\epsilon \epsilon _{n_k}^2/|M| \) for a suitable constant \(c_\epsilon >0\). Due to \(\Delta _{n_k}\prec \epsilon _{n_k}^2\), we have \(d_1(p_{I_k}^*,p_{I_k,M})\le \epsilon _{n_k}^2/4\). Let \(S_{I_k}=\{p(y_{I_k}|M,\beta _{I_k}):\beta _{I_k}\in (\beta _{I_{k,g}}^*\pm \delta _{n_k})_{g=1,\ldots ,G_n}\}\) and \(T_{I_k}=\{p_{I_k}:d_1(p_{I_k}^*,p_{I_k})\le \epsilon _{n_k}^2/4\}\). For a sufficiently large \(n_k\), \(\pi (T_{I_k})>\pi (S_{I_k})\ge \exp (-n_k\epsilon _{n_k}^2/4)\).
-
ii)
\(\log N(\epsilon _{n_k},{\mathcal {P}}_{n_k})\le n_k \epsilon _{n_k}^2\) for all sufficiently large \(n_k\). Denote regression parameters \(u_{I_k}=\{u_{I_{k,1}},\ldots ,u_{I_{k,G_n}}\}\) and \(v_{I_k}=\{v_{I_{k,1}},\ldots ,v_{I_{k,G_n}}:\Vert v_{I_{k,g}}\Vert _1\le C_G\}\) for constant \(C_G\) (\(\Vert \beta _{I_{k,g}}\Vert _1\le C_G\)). \(u_{I_{k,g}}\) and \(v_{I_{k,g}}\) are zero for the same set of components M (\(|M|\le \bar{r}_{n_k}\)). For \(p_{I_{k,u}}=\exp [y_{I_k}^\top \theta _{I_{k,u}}-b(\theta _{I_{k,u}})+c(y_{I_k})]\) and \(\theta _{I_{k,u}}=\sum _{g=1}^{G_n}X_{I_{k,g}}u_{I_{k,g}}\), \(p_{I_{k,v}}\) and \(\theta _{I_{k,v}}\) are the same as \(p_{I_{k,u}}\) and \(\theta _{I_{k,u}}\). Then, the Hellinger distance \(d(p_{I_{k,u}},p_{I_{k,v}})\le \sqrt{d_0(p_{I_{k,u}},p_{I_{k,v}})} \) and
$$\begin{aligned} d_0(p_{I_{k,u}},p_{I_{k,v}})\le E(b'(\theta _{I_{k,v}})-b'(\theta _{I_k}^l))^\top (\theta _{I_{k,v}}-\theta _{I_{k,u}}), \end{aligned}$$where \(\theta _{I_k}^l\) is an in-between vector of \(\theta _{I_{k,v}}\) and \(\theta _{I_{k,u}}\). Observe that
$$\begin{aligned} \Vert \theta _{I_{k,v}}-\theta _{I_{k,u}}\Vert _1=\bigg \Vert \sum _{g=1}^{G_n} X_{I_{k,g}}(v_{I_{k,g}}-u_{I_{k,g}})\bigg \Vert _1\le C_\theta \bar{r}_{n_k} \delta ; \end{aligned}$$then,
$$\begin{aligned} d_0(p_{I_{k,u}},p_{I_{k,v}})\le 2 \sup _{\Vert \theta \Vert _1\le \bar{r}_{n_k} C_G} \Vert b'(\theta )\Vert _1 \bar{r}_{n_k} \delta , d(p_{I_{k,u}},p_{I_{k,v}})\le \sqrt{2 \sup _{|\theta |\le \bar{r}_{n_k} C_G} \Vert b'(\theta )\Vert _1 \bar{r}_n \delta }. \end{aligned}$$Therefore, \(d(p_{I_{k,u}},p_{I_{k,v}})\le \epsilon _{n_k}\) if \(\delta =\epsilon _{n_k}^2/\{\sup _{|\theta _{I_k}|\le \bar{r}_{n_k} C_G} \Vert b'(\theta _{I_k})\Vert _1 \bar{r}_{n_k} \}\). Additionally,
$$\begin{aligned} N(\epsilon _{n_k},{\mathcal {P}}_{n_k})\le & {} (\bar{r}_{n_k}+1)G_n^{\bar{r}_{n_k}} \biggl (1+2\epsilon _{n_k}^{-2}\cdot \sup _{|\theta _{I_k}|\le \bar{r}_{n_k} C_G} \Vert b'(\theta )\Vert _1 \bar{r}_{n_k} C_G \biggl )^{\bar{r}_{n_k}}.\\ \end{aligned}$$By (A2), we obtain the result of ii).
-
iii)
Using (A2), \(\pi ({\mathcal {P}}_{n_k}^c)\le \exp (-2n_k\epsilon _{n_k}^2)\) for all sufficiently large \(n_k\). Observe that
$$\begin{aligned} \pi ({\mathcal {P}}_{n_k}^c)\le & {} \pi (|M|>\bar{r}_{n_k})\\&+\sum _{M:|M|\le \bar{r}_{n_k}}\pi (M) \pi \bigg (\bigcup _{g=1,\ldots ,G_n}\Big (\Vert \beta _{I_{k,g}}\Vert _1>C_G\Big )\big |M\bigg ). \end{aligned}$$
Then,
for all sufficiently large \(n_k\). Wherefrom, we have the result of iii).
Due to i), ii), and iii), there exists \(N_{n_k}\) such that
Thus, we have the theorem. \(\square \)
Proof of Theorem 4
Note that \(|p(\beta , M)|\) is bounded, there exists \(C_M\) that satisfies
By Chebyshev’s inequality,
With (A3), we obtain
By Hoeffding’s inequality, (A4) and (A5), we have
Let \(t=\varepsilon \), we then have
The proof is finished.\(\square \)
Rights and permissions
About this article
Cite this article
Guo, G., Qian, G., Lin, L. et al. Parallel inference for big data with the group Bayesian method. Metrika 84, 225–243 (2021). https://doi.org/10.1007/s00184-020-00784-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-020-00784-0