Skip to main content
Log in

Sampling hierarchies of discrete random structures

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Hierarchical normalized discrete random measures identify a general class of priors that is suited to flexibly learn how the distribution of a response variable changes across groups of observations. A special case widely used in practice is the hierarchical Dirichlet process. Although current theory on hierarchies of nonparametric priors yields all relevant tools for drawing posterior inference, their implementation comes at a high computational cost. We fill this gap by proposing an approximation for a general class of hierarchical processes, which leads to an efficient conditional Gibbs sampling algorithm. The key idea consists of a deterministic truncation of the underlying random probability measures leading to a finite dimensional approximation of the original prior law. We provide both empirical and theoretical support for such a procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Arbel, J., Prünster, I.: A moment-matching Ferguson & Klass algorithm. Stat. Comput. 27(1), 3–17 (2017)

    MathSciNet  MATH  Google Scholar 

  • Arbel, J., De Blasi, P., Prünster, I.: Stochastic approximations to the Pitman-Yor process. Bayesian Anal. 14(4), 1201–1219 (2019)

    MathSciNet  MATH  Google Scholar 

  • Argiento, R., Bianchini, I., Guglielmi, A.: A blocked gibbs sampler for ngg-mixture models via a priori truncation. Stat. Comput. 26(3), 641–666 (2016)

    MathSciNet  MATH  Google Scholar 

  • Argiento, R., Cremaschi, A., Vannucci, M.: Hierarchical normalized completely random measures to cluster grouped data. J. Am. Stat. Assoc. 115(229), 318–333 (2020)

    MathSciNet  MATH  Google Scholar 

  • Barrios, E., Lijoi, A., Nieto-Barajas, L.E., Prünster, I.: Modeling with normalized random measure mixture models. Stat. Sci. 28(3), 313–334 (2013)

    MathSciNet  MATH  Google Scholar 

  • Bassetti, F., Casarin, R., Rossini, L.: Hierarchical species sampling models. Bayesian Anal. (2020). https://doi.org/10.1214/19-BA1168

    Article  MathSciNet  Google Scholar 

  • Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  • Camerlenghi, F., Lijoi, A., Orbanz, P., Prünster, I.: Distribution theory for hierarchical processes. Ann. Stat. 49(1), 67–92 (2019)

    MathSciNet  MATH  Google Scholar 

  • Carlton, M.A.: A family of densities derived from the three-parameter Dirichlet process. J. Appl. Prob. 39, 764–774 (2002)

    MathSciNet  MATH  Google Scholar 

  • Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., Riddel, A.: Stan: a probabilistic programming language. J. Stat. Softw. 76(1), 1–32 (2017)

    Google Scholar 

  • Cifarelli, D., Regazzini, E.: Problemi statistici non parametrici in condizioni di scambiabilità parziale. Technical report, Quaderni Istituto Matematica Finanziaria, Università di Torino Serie III, 12 (1978)

    Google Scholar 

  • Clogg, C.C., Goodman, L.A.: On scaling models applied to data from several groups. Psychometrika 51(1), 123–135 (1986)

    Google Scholar 

  • Connor, R.J., Mosimman, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Stat. Assoc. 64(325), 194–206 (1969)

    MathSciNet  Google Scholar 

  • Daley, DJ., Vere-Jones, D.: An introduction to the theory of point processes. Vol. I, 2nd edn. Probability and its Applications (New York), Springer, New York, elementary theory and methods (2003)

  • De Blasi, P., Favaro, S., Lijoi, A., Mena, R.H., Prünster, I., Ruggiero, M.: Are Gibbs-type priors the most natural generalization of the dirichlet process? IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 212–229 (2015)

    Google Scholar 

  • Diaconis, P., Ylvisaker, D.: Conjugate prior for exponential families. Ann. Stat. 7(2), 269–292 (1979)

    MathSciNet  MATH  Google Scholar 

  • Dunson, D.B., Xing, C.: Nonparametric Bayes modeling of multivariate categorical data. J. Am. Stat. Assoc. 104(487), 1042–1051 (2009)

    MathSciNet  MATH  Google Scholar 

  • Escobar, M.D., West, M.: Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 90(430), 577–588 (1995)

    MathSciNet  MATH  Google Scholar 

  • Favaro, S., Hadjicharalambous, G., Prünster, I.: On a class of distributions on the simplex. J. Stat. Plan. Inference 141(9), 2987–3004 (2011)

    MathSciNet  MATH  Google Scholar 

  • Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. Ann. Stat. 1(2), 209–230 (1973)

    MathSciNet  MATH  Google Scholar 

  • Ferguson, T.S., Klass, M.J.: A representation of independent increment processes without Gaussian components. Ann. Math. Stat. 43(5), 1634–1643 (1972)

    MathSciNet  MATH  Google Scholar 

  • Fox, E.B., Sudderth, E.B., Jordan, M.I., Willsky, A.S.: A sticky HDP-HMM with application to speaker diarization. Ann. Appl. Stat. 5(2A), 1020–1056 (2011)

    MathSciNet  MATH  Google Scholar 

  • Goodman, L.A.: Exploratory latent structure analysis using both identifiable and unindentifiable models. Biometrika 61(2), 215–231 (1974)

    MathSciNet  MATH  Google Scholar 

  • Goodman, L.A.: A new model for scaling response patterns: an application of quasi independence concept. J. Am. Stat. Assoc. 70(352), 755–768 (1975)

    MATH  Google Scholar 

  • Griffin, J.E., Leisen, F.: Compound random measures and their use in Bayesian non-parametrics. J. R. Stat. Soc. Series B Stat. Methodol. 79(2), 525–545 (2017)

  • Hagenaars, J.A., McCutcheon, A.L.: Applied Latent Class Analysis. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  • Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 96(453), 161–173 (2001)

    MathSciNet  MATH  Google Scholar 

  • Ishwaran, H., Zarepour, M.: Exact and approximate sum representations for the Dirichlet process. Can. J. Stat. 30(2), 269–283 (2002)

    MathSciNet  MATH  Google Scholar 

  • James, L.F., Lijoi, A., Prünster, I.: Conjugacy as a distinctive feature of the Dirichlet process. Scand. J. Stat. 33(1), 105–120 (2006)

    MathSciNet  MATH  Google Scholar 

  • Kingman, J.F.C.: Random discrete distribution. J. R. Stat. Soc. Ser. B Stat. Methodol. 37, 1–22 (1975)

  • Lazarsfeld, P.F., Henry, N.W.: Latent structure analysis. Houghton Mifflin, Boston, MA (1968)

    MATH  Google Scholar 

  • Lijoi, A., Nipoti, B.: A class of hazard rate mixtures for combining survival data from different experiments. J. Am. Stat. Assoc. 109(506), 802–814 (2014)

    MathSciNet  MATH  Google Scholar 

  • Lijoi, A., Prünster, I.: Models beyond the Dirichlet process. In: Holmes, C.C., Muller, P., Walker, S.G. (eds.) Hjort NL. Cambridge University Press, Bayesian Nonparametrics (2010)

    Google Scholar 

  • Lijoi, A., Mena, R.H., Prünster, I.: Hierarchical mixture modeling with normalized inverse-Gaussian priors. J. Am. Stat. Assoc. 100(472), 1278–1291 (2005)

    MathSciNet  MATH  Google Scholar 

  • Lijoi, A., Mena, R.H., Prünster, I.: Controlling the reinforcement in Bayesian non-parametric mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69(4), 715–740 (2007)

  • Lijoi, A., Nipoti, B., Prünster, I.: Bayesian inference with dependent normalized completely random measures. Bernoulli 20(3), 1260–1291 (2014a)

    MathSciNet  MATH  Google Scholar 

  • Lijoi, A., Nipoti, B., Prünster, I.: Dependent mixture models: clustering and borrowing information. Comput. Stat. Data Anal. 71, 417–433 (2014b)

    MathSciNet  MATH  Google Scholar 

  • Lijoi, A., Prünster, I., Rigon, T.: Finite-dimensional discrete random structures and Bayesian clustering. Tech. rep. Collegio Carlo Alberto, n. 600 (2019)

  • Lo, A.Y.: On a class of Bayesian nonparametric estimates: I. density estimates. Ann. Stat. 12(1), 351–357 (1984)

  • MacEachern, SN.: Dependent nonparametric processes. In: ASA Proceedings of the Section on Bayesian Statistical Science, Alexandria, VA: American Statistical Association, pp. 50–55 (1999)

  • Muliere, P., Tardella, L.: Approximating distributions of random functionals of Ferguson-Dirichlet priors. Can. J. Stat. 26(2), 283–297 (1998)

    MathSciNet  MATH  Google Scholar 

  • Perman, M.: Random discrete distributions derived from subordinators. ProQuest LLC, Ann Arbor, MI, thesis (Ph.D.)–University of California, Berkeley (1990)

  • Perman, M., Pitman, J., Yor, M.: Size-biased sampling of Poisson point processes and excursions. Prob. Theory Related Fields 92(1), 21–39 (1992)

    MathSciNet  MATH  Google Scholar 

  • Pitman, J.: Some Developments of the Blackwell-Macqueen Urn Scheme. Stat. Prob. Game Theory 30, 245–267 (1996)

    MathSciNet  Google Scholar 

  • Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Prob. 25(2), 855–900 (1997)

    MathSciNet  MATH  Google Scholar 

  • Regazzini, E., Lijoi, A., Prünster, I.: Distributional results for means of normalized random measures with independent increments. Ann. Stat. 31(2), 560–585 (2003)

    MathSciNet  MATH  Google Scholar 

  • Roberts, G.O., Rosenthal, J.S.: Examples of adaptive MCMC. J Comput. Graphical Stat. 18(2), 349–367 (2009)

    MathSciNet  Google Scholar 

  • Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4(2), 639–650 (1994)

    MathSciNet  MATH  Google Scholar 

  • Stouffer, S.A., Toby, J.: Role conflict and personality. Am. J. Sociol. 56(5), 395–406 (1951)

    Google Scholar 

  • Teh, Y.W., Jordan, M.I.: Hierarchical Bayesian nonparametric models with applications. In: Holmes, C.C., Muller, P., Walker, S.G. (eds.) Hjort NL. Cambridge University Press, Bayesian Nonparametrics (2010)

    Google Scholar 

  • Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1–41 (2006)

    MathSciNet  MATH  Google Scholar 

  • Zhang, L., Guindani, M., Versace, F., Engelmann, J.M., Vannucci, M.: A spatiotemporal nonparametric Bayesian model of multi-subject fMRI data. Ann. Appl. Stat. 10(2), 638–666 (2016)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Most of the paper was completed while T. Rigon was a Ph.D. student at the Bocconi University, Milano. A. Lijoi and I. Prünster were partially supported by MIUR, PRIN Project 2015SNS29B. T. Rigon was partially supported by grant R01ES027498 of the National Institute of Environmental Health Sciences of the United States National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Lijoi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (txt 15 KB)

Appendix

Appendix

1.1 Proof of Theorem 1

Recall that \(({\tilde{p}}_1, \ldots , {\tilde{p}}_d)\) comes from a hierarchical nrmi-py process as in (6). Moreover, let \(({\tilde{p}}_1^H, \dots , {\tilde{p}}_d^H)\) be the approximate hierarchical nrmi-py process defined in (10), with truncation level H. Then for any \(A \in {\mathscr {X}}\), and exploiting representation (8), we have that almost surely

$$\begin{aligned} \left| {\tilde{p}}_l(A) - {\tilde{p}}_l^H(A) \right|= & {} \Big | \sum _{h=1}^\infty \pi _{lh} \delta _{\phi _h}(A) - \Big ( \sum _{h=1}^{H-1} \pi _{lh} \delta _{\phi _h}(A) \\&+\left( 1 - \sum _{h=1}^{H-1}\pi _{lh}\right) \delta _{\phi _H}(A)\Big )\Big | \\= & {} \Big |\pi _{lH} \delta _{\phi _H}(A) + \sum _{h> H}\pi _{lh}\delta _{\phi _h}(A) + \\&- \left( 1 - \sum _{h=1}^{H-1}\pi _{lh}\right) \delta _{\phi _H}(A) \Big | \\= & {} \left| \delta _{\phi _H}(A) \sum _{h> H}\pi _{lh} - \sum _{h> H}\pi _{lh}\delta _{\phi _h}(A) \right| \\\le & {} \sum _{h > H} \pi _{lh} = R_{lH}. \end{aligned}$$

Note that \(\sum _{h> H} \pi _{lh} \ge \sum _{h > H}\pi _{lh}\delta _{\phi _h}(A)\) almost surely. Hence, if \(\delta _{\phi _H}(A) = 0\), then the last inequality easily follows, and the same holds true if \(\delta _{\phi _H}(A) = 1\). Hence,

$$\begin{aligned} d_{{\textsc {tv}}}\left( {\tilde{p}}_l, {\tilde{p}}_l^H\right) =\sup _{A \in {\mathscr {X}}} \left| {\tilde{p}}_l(A) - {\tilde{p}}_l^H(A) \right| \le R_{lH} = \sum _{h > H} \pi _{lh}, \end{aligned}$$

almost surely. Moreover, note that

$$\begin{aligned} \left( \sum _{h > H}\pi _{lh} \mid {\varvec{\pi }}_0\right) \sim {\textsc {nid}}\left( c \left( 1 - \sum _{h=1}^H \pi _{0h} \right) , c \sum _{h=1}^H \pi _{0h}; \rho \right) , \end{aligned}$$

from which it follows that the expected value is equal to

$$\begin{aligned} \begin{aligned} \mathbb {E}\left( \sum _{h> H}\pi _{lh} \right)&= \mathbb {E}\left( \mathbb {E}\left( \sum _{h> H}\pi _{lh} \mid {\varvec{\pi }}_0\right) \right) \\&= \mathbb {E}\left( \sum _{h > H}\pi _{0h} \right) = \prod _{h=1}^H \mathbb {E}\left( 1 - \nu _{0h} \right) \\&= \prod _{h=1}^H \frac{c_0 + \sigma _0 h}{c_0 + \sigma _0(h-1) + 1}. \end{aligned} \end{aligned}$$

Now recall that \({\mathcal {I}}(c,\rho ) = c \int _{\mathbb {R}^+}u \mathrm {e}^{-c \psi (u)} \tau _2(u) \,{\text {d}}{u}\) with \(\tau _2(u) = \int _{\mathbb {R}^+}s^2 \mathrm {e}^{-us}\rho ( \,{\text {d}}{s})\) and let \(R_{0H} = \sum _{h > H} \pi _{0h}\), then

$$\begin{aligned} \begin{aligned} \text {Var}\left( R_{lH} \right)&= \mathbb {E}\left( \text {Var}\left( \sum _{h> H}\pi _{lh} \mid {\varvec{\pi }}_0 \right) \right) \\&\quad + \text {Var}\left( \mathbb {E}\left( \sum _{h > H}\pi _{lh} \mid {\varvec{\pi }}_0 \right) \right) \\&= {\mathcal {I}}(c,\rho ) \mathbb {E}\left( R_{0H} - R_{0H}^2 \right) + \mathbb {E}(R_{0H}^2) - \mathbb {E}\left( R_{0H}\right) ^2, \\ \end{aligned} \end{aligned}$$

where \(\mathbb {E}(R_{0H})\) can be computed as before and

$$\begin{aligned} \mathbb {E}\left( R_{0H}^2\right) = \prod _{h=1}^H \mathbb {E}\left( \left( 1 - \nu _{0h} \right) ^2\right) = \prod _{h=1}^H \frac{(c_0 + \sigma _0 h)_2}{(c_0 + \sigma _0(h-1) + 1)_2}, \end{aligned}$$

recalling that \((x)_{r} = x(x+1)\cdots (x+r-1)\) denotes the Pochhammer symbol.

1.2 Proof of Theorem 2

First note that the expected value of the truncated Pitman–Yor process \({\tilde{p}}_0^H \sim {\textsc {py}}_H(\sigma _0,c_0,P_0)\), for any \(A \in {\mathscr {X}}\) and any \(H=1,2,\dots \), is equal to the baseline measure

$$\begin{aligned} \mathbb {E}({\tilde{p}}_0^H(A)) = \sum _{h=1}^H \mathbb {E}(\pi _{0h})\mathbb {E}(\delta _{\phi _h}(A)) = P_0(A). \end{aligned}$$

Moreover, one can show that

$$\begin{aligned} \text {Var}({\tilde{p}}_0^H(A)) = P_0(A)(1-P_0(A))\sum _{h=1}^H\mathbb {E}(\pi _{0h}^2), \end{aligned}$$

for any \(H = 1,2,\ldots \), and \(A \in {\mathscr {X}}\). Define \({\mathcal {I}}_0(\sigma _0,c_0,H) = \sum _{h=1}^H\mathbb {E}(\pi _{0h}^2)\) and recall that \({\mathcal {I}}(c,\rho ) = c \int _{\mathbb {R}^+}u \mathrm {e}^{-c \psi (u)} \tau _2(u) \,{\text {d}}{u}\) with \(\tau _2(u) = \int _{\mathbb {R}^+}s^2 \mathrm {e}^{-us}\rho ( \,{\text {d}}{s})\). From Proposition 1 of James et al. (2006), one has that \(\text {Var}({\tilde{p}}_l^H(A) \mid {\tilde{p}}_0^H) = P_0(A)(1- P_0(A)){\mathcal {I}}(c,\rho )\) for any \(A \in {\mathscr {X}}\). Hence, for any \(l=1,\dots ,d\),

$$\begin{aligned} \begin{aligned} \text {Var}({\tilde{p}}_l^H(A))&= \mathbb {E}(\text {Var}({\tilde{p}}_l^H(A) \mid {\tilde{p}}_0^H)) + \text {Var}({\tilde{p}}_0^H(A)) \\&= {\mathcal {I}}(c,\rho )\mathbb {E}({\tilde{p}}_0^H(A)(1 - {\tilde{p}}_0^H(A))) \\&\quad + P_0(A)(1-P_0(A)){\mathcal {I}}_0(\sigma _0,c_0,H) \\&= P_0(A)(1 - P_0(A))({\mathcal {I}}(c,\rho ) - {\mathcal {I}}(c,\rho ){\mathcal {I}}_0(\sigma _0,c_0,H) \\&\quad + {\mathcal {I}}_0(\sigma _0,c_0,H)). \end{aligned} \end{aligned}$$

Moreover, following Camerlenghi et al. (2019, Appendix A.1), for any \(l \ne l'\)

$$\begin{aligned} \begin{aligned} \text {Cov}({\tilde{p}}_l^H(A),{\tilde{p}}_{l'}^H(A))&= \text {Var}({\tilde{p}}_0^H(A)) \\&= P_0(A)(1-P_0(A)){\mathcal {I}}_0(\sigma _0,c_0,H), \end{aligned} \end{aligned}$$

from which it follows that

$$\begin{aligned} \text {Cor}({\tilde{p}}_l^H(A),{\tilde{p}}_{l'}^H(A)) = \frac{{\mathcal {I}}_0(\sigma _0,c_0,H)}{{\mathcal {I}}(c,\rho ) + {\mathcal {I}}_0(\sigma _0,c_0,H)(1- {\mathcal {I}}(c,\rho ))}. \end{aligned}$$

It remains to find the explicit formulation of \({\mathcal {I}}_0(\sigma _0,c_0,H)\), being equal to

$$\begin{aligned} \begin{aligned}&{\mathcal {I}}_0(\sigma _0,c_0,H) = \sum _{h=1}^H\mathbb {E}(\pi _{0h}^2) = \sum _{h=1}^H \mathbb {E}\left( \nu _{0h}^2\prod _{l=1}^{h-1}(1-\nu _{0l})^2\right) \\&\quad = \sum _{h=1}^{H-1} \left[ \frac{(1 - \sigma _0)_2}{(1 + c_0 + (h-1)\sigma _0)_2} \left( \prod _{l=1}^{h-1} \frac{(c_0 + l\sigma _0)_2}{(1 + c_0 + (l-1)\sigma _0)_2}\right) \right] \\&\qquad + \left( \prod _{l=1}^{H-1} \frac{(c_0 + l\sigma _0)_2}{(1 + c_0 + (l-1)\sigma _0)_2}\right) . \end{aligned} \end{aligned}$$

All the above results hold also for the infinite case, having replaced \({\mathcal {I}}_0(\sigma _0,c_0,H)\) with its limit \({\mathcal {I}}_0(\sigma _0,c_0)\), so that

$$\begin{aligned} \begin{aligned} \lim _{H \rightarrow +\infty } {\mathcal {I}}_0(\sigma _0,c_0,H)&= {\mathcal {I}}_0(\sigma _0,c_0) = \mathbb {E}\left( \sum _{h=1}^\infty \pi _{0h}^2\right) \\&= \sum _{h=1}^\infty \mathbb {E}\left( \pi _{0h}^2\right) = \frac{1 - \sigma _0}{1 + c_0}, \end{aligned} \end{aligned}$$

where the last equality follows for instance from Ishwaran and James (2001, Appendix A.2).

1.3 Dataset

Table 3 The Stouffer and Toby (1951) dataset. We report the frequencies for each possible combination of the \(2^4 = 16\) responses, divided over the the three groups ego, smith and friend

We report in Table 3 the dataset used in the illustrative analysis of Sect. 6 and originally presented in Stouffer and Toby (1951).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lijoi, A., Prünster, I. & Rigon, T. Sampling hierarchies of discrete random structures. Stat Comput 30, 1591–1607 (2020). https://doi.org/10.1007/s11222-020-09961-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-020-09961-7

Keywords

Navigation