Abstract
Hierarchical normalized discrete random measures identify a general class of priors that is suited to flexibly learn how the distribution of a response variable changes across groups of observations. A special case widely used in practice is the hierarchical Dirichlet process. Although current theory on hierarchies of nonparametric priors yields all relevant tools for drawing posterior inference, their implementation comes at a high computational cost. We fill this gap by proposing an approximation for a general class of hierarchical processes, which leads to an efficient conditional Gibbs sampling algorithm. The key idea consists of a deterministic truncation of the underlying random probability measures leading to a finite dimensional approximation of the original prior law. We provide both empirical and theoretical support for such a procedure.
Similar content being viewed by others
References
Arbel, J., Prünster, I.: A moment-matching Ferguson & Klass algorithm. Stat. Comput. 27(1), 3–17 (2017)
Arbel, J., De Blasi, P., Prünster, I.: Stochastic approximations to the Pitman-Yor process. Bayesian Anal. 14(4), 1201–1219 (2019)
Argiento, R., Bianchini, I., Guglielmi, A.: A blocked gibbs sampler for ngg-mixture models via a priori truncation. Stat. Comput. 26(3), 641–666 (2016)
Argiento, R., Cremaschi, A., Vannucci, M.: Hierarchical normalized completely random measures to cluster grouped data. J. Am. Stat. Assoc. 115(229), 318–333 (2020)
Barrios, E., Lijoi, A., Nieto-Barajas, L.E., Prünster, I.: Modeling with normalized random measure mixture models. Stat. Sci. 28(3), 313–334 (2013)
Bassetti, F., Casarin, R., Rossini, L.: Hierarchical species sampling models. Bayesian Anal. (2020). https://doi.org/10.1214/19-BA1168
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Camerlenghi, F., Lijoi, A., Orbanz, P., Prünster, I.: Distribution theory for hierarchical processes. Ann. Stat. 49(1), 67–92 (2019)
Carlton, M.A.: A family of densities derived from the three-parameter Dirichlet process. J. Appl. Prob. 39, 764–774 (2002)
Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., Riddel, A.: Stan: a probabilistic programming language. J. Stat. Softw. 76(1), 1–32 (2017)
Cifarelli, D., Regazzini, E.: Problemi statistici non parametrici in condizioni di scambiabilità parziale. Technical report, Quaderni Istituto Matematica Finanziaria, Università di Torino Serie III, 12 (1978)
Clogg, C.C., Goodman, L.A.: On scaling models applied to data from several groups. Psychometrika 51(1), 123–135 (1986)
Connor, R.J., Mosimman, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Stat. Assoc. 64(325), 194–206 (1969)
Daley, DJ., Vere-Jones, D.: An introduction to the theory of point processes. Vol. I, 2nd edn. Probability and its Applications (New York), Springer, New York, elementary theory and methods (2003)
De Blasi, P., Favaro, S., Lijoi, A., Mena, R.H., Prünster, I., Ruggiero, M.: Are Gibbs-type priors the most natural generalization of the dirichlet process? IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 212–229 (2015)
Diaconis, P., Ylvisaker, D.: Conjugate prior for exponential families. Ann. Stat. 7(2), 269–292 (1979)
Dunson, D.B., Xing, C.: Nonparametric Bayes modeling of multivariate categorical data. J. Am. Stat. Assoc. 104(487), 1042–1051 (2009)
Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90(430), 577–588 (1995)
Favaro, S., Hadjicharalambous, G., Prünster, I.: On a class of distributions on the simplex. J. Stat. Plan. Inference 141(9), 2987–3004 (2011)
Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. Ann. Stat. 1(2), 209–230 (1973)
Ferguson, T.S., Klass, M.J.: A representation of independent increment processes without Gaussian components. Ann. Math. Stat. 43(5), 1634–1643 (1972)
Fox, E.B., Sudderth, E.B., Jordan, M.I., Willsky, A.S.: A sticky HDP-HMM with application to speaker diarization. Ann. Appl. Stat. 5(2A), 1020–1056 (2011)
Goodman, L.A.: Exploratory latent structure analysis using both identifiable and unindentifiable models. Biometrika 61(2), 215–231 (1974)
Goodman, L.A.: A new model for scaling response patterns: an application of quasi independence concept. J. Am. Stat. Assoc. 70(352), 755–768 (1975)
Griffin, J.E., Leisen, F.: Compound random measures and their use in Bayesian non-parametrics. J. R. Stat. Soc. Series B Stat. Methodol. 79(2), 525–545 (2017)
Hagenaars, J.A., McCutcheon, A.L.: Applied Latent Class Analysis. Cambridge University Press, Cambridge (2002)
Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 96(453), 161–173 (2001)
Ishwaran, H., Zarepour, M.: Exact and approximate sum representations for the Dirichlet process. Can. J. Stat. 30(2), 269–283 (2002)
James, L.F., Lijoi, A., Prünster, I.: Conjugacy as a distinctive feature of the Dirichlet process. Scand. J. Stat. 33(1), 105–120 (2006)
Kingman, J.F.C.: Random discrete distribution. J. R. Stat. Soc. Ser. B Stat. Methodol. 37, 1–22 (1975)
Lazarsfeld, P.F., Henry, N.W.: Latent structure analysis. Houghton Mifflin, Boston, MA (1968)
Lijoi, A., Nipoti, B.: A class of hazard rate mixtures for combining survival data from different experiments. J. Am. Stat. Assoc. 109(506), 802–814 (2014)
Lijoi, A., Prünster, I.: Models beyond the Dirichlet process. In: Holmes, C.C., Muller, P., Walker, S.G. (eds.) Hjort NL. Cambridge University Press, Bayesian Nonparametrics (2010)
Lijoi, A., Mena, R.H., Prünster, I.: Hierarchical mixture modeling with normalized inverse-Gaussian priors. J. Am. Stat. Assoc. 100(472), 1278–1291 (2005)
Lijoi, A., Mena, R.H., Prünster, I.: Controlling the reinforcement in Bayesian non-parametric mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69(4), 715–740 (2007)
Lijoi, A., Nipoti, B., Prünster, I.: Bayesian inference with dependent normalized completely random measures. Bernoulli 20(3), 1260–1291 (2014a)
Lijoi, A., Nipoti, B., Prünster, I.: Dependent mixture models: clustering and borrowing information. Comput. Stat. Data Anal. 71, 417–433 (2014b)
Lijoi, A., Prünster, I., Rigon, T.: Finite-dimensional discrete random structures and Bayesian clustering. Tech. rep. Collegio Carlo Alberto, n. 600 (2019)
Lo, A.Y.: On a class of Bayesian nonparametric estimates: I. density estimates. Ann. Stat. 12(1), 351–357 (1984)
MacEachern, SN.: Dependent nonparametric processes. In: ASA Proceedings of the Section on Bayesian Statistical Science, Alexandria, VA: American Statistical Association, pp. 50–55 (1999)
Muliere, P., Tardella, L.: Approximating distributions of random functionals of Ferguson-Dirichlet priors. Can. J. Stat. 26(2), 283–297 (1998)
Perman, M.: Random discrete distributions derived from subordinators. ProQuest LLC, Ann Arbor, MI, thesis (Ph.D.)–University of California, Berkeley (1990)
Perman, M., Pitman, J., Yor, M.: Size-biased sampling of Poisson point processes and excursions. Prob. Theory Related Fields 92(1), 21–39 (1992)
Pitman, J.: Some Developments of the Blackwell-Macqueen Urn Scheme. Stat. Prob. Game Theory 30, 245–267 (1996)
Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Prob. 25(2), 855–900 (1997)
Regazzini, E., Lijoi, A., Prünster, I.: Distributional results for means of normalized random measures with independent increments. Ann. Stat. 31(2), 560–585 (2003)
Roberts, G.O., Rosenthal, J.S.: Examples of adaptive MCMC. J Comput. Graphical Stat. 18(2), 349–367 (2009)
Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4(2), 639–650 (1994)
Stouffer, S.A., Toby, J.: Role conflict and personality. Am. J. Sociol. 56(5), 395–406 (1951)
Teh, Y.W., Jordan, M.I.: Hierarchical Bayesian nonparametric models with applications. In: Holmes, C.C., Muller, P., Walker, S.G. (eds.) Hjort NL. Cambridge University Press, Bayesian Nonparametrics (2010)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1–41 (2006)
Zhang, L., Guindani, M., Versace, F., Engelmann, J.M., Vannucci, M.: A spatiotemporal nonparametric Bayesian model of multi-subject fMRI data. Ann. Appl. Stat. 10(2), 638–666 (2016)
Acknowledgements
Most of the paper was completed while T. Rigon was a Ph.D. student at the Bocconi University, Milano. A. Lijoi and I. Prünster were partially supported by MIUR, PRIN Project 2015SNS29B. T. Rigon was partially supported by grant R01ES027498 of the National Institute of Environmental Health Sciences of the United States National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 Proof of Theorem 1
Recall that \(({\tilde{p}}_1, \ldots , {\tilde{p}}_d)\) comes from a hierarchical nrmi-py process as in (6). Moreover, let \(({\tilde{p}}_1^H, \dots , {\tilde{p}}_d^H)\) be the approximate hierarchical nrmi-py process defined in (10), with truncation level H. Then for any \(A \in {\mathscr {X}}\), and exploiting representation (8), we have that almost surely
Note that \(\sum _{h> H} \pi _{lh} \ge \sum _{h > H}\pi _{lh}\delta _{\phi _h}(A)\) almost surely. Hence, if \(\delta _{\phi _H}(A) = 0\), then the last inequality easily follows, and the same holds true if \(\delta _{\phi _H}(A) = 1\). Hence,
almost surely. Moreover, note that
from which it follows that the expected value is equal to
Now recall that \({\mathcal {I}}(c,\rho ) = c \int _{\mathbb {R}^+}u \mathrm {e}^{-c \psi (u)} \tau _2(u) \,{\text {d}}{u}\) with \(\tau _2(u) = \int _{\mathbb {R}^+}s^2 \mathrm {e}^{-us}\rho ( \,{\text {d}}{s})\) and let \(R_{0H} = \sum _{h > H} \pi _{0h}\), then
where \(\mathbb {E}(R_{0H})\) can be computed as before and
recalling that \((x)_{r} = x(x+1)\cdots (x+r-1)\) denotes the Pochhammer symbol.
1.2 Proof of Theorem 2
First note that the expected value of the truncated Pitman–Yor process \({\tilde{p}}_0^H \sim {\textsc {py}}_H(\sigma _0,c_0,P_0)\), for any \(A \in {\mathscr {X}}\) and any \(H=1,2,\dots \), is equal to the baseline measure
Moreover, one can show that
for any \(H = 1,2,\ldots \), and \(A \in {\mathscr {X}}\). Define \({\mathcal {I}}_0(\sigma _0,c_0,H) = \sum _{h=1}^H\mathbb {E}(\pi _{0h}^2)\) and recall that \({\mathcal {I}}(c,\rho ) = c \int _{\mathbb {R}^+}u \mathrm {e}^{-c \psi (u)} \tau _2(u) \,{\text {d}}{u}\) with \(\tau _2(u) = \int _{\mathbb {R}^+}s^2 \mathrm {e}^{-us}\rho ( \,{\text {d}}{s})\). From Proposition 1 of James et al. (2006), one has that \(\text {Var}({\tilde{p}}_l^H(A) \mid {\tilde{p}}_0^H) = P_0(A)(1- P_0(A)){\mathcal {I}}(c,\rho )\) for any \(A \in {\mathscr {X}}\). Hence, for any \(l=1,\dots ,d\),
Moreover, following Camerlenghi et al. (2019, Appendix A.1), for any \(l \ne l'\)
from which it follows that
It remains to find the explicit formulation of \({\mathcal {I}}_0(\sigma _0,c_0,H)\), being equal to
All the above results hold also for the infinite case, having replaced \({\mathcal {I}}_0(\sigma _0,c_0,H)\) with its limit \({\mathcal {I}}_0(\sigma _0,c_0)\), so that
where the last equality follows for instance from Ishwaran and James (2001, Appendix A.2).
1.3 Dataset
We report in Table 3 the dataset used in the illustrative analysis of Sect. 6 and originally presented in Stouffer and Toby (1951).
Rights and permissions
About this article
Cite this article
Lijoi, A., Prünster, I. & Rigon, T. Sampling hierarchies of discrete random structures. Stat Comput 30, 1591–1607 (2020). https://doi.org/10.1007/s11222-020-09961-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-020-09961-7