Skip to main content
Log in

Approximate Bayesian computation in controlled branching processes: the role of summary statistics

  • Original Paper
  • Published:
Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matemáticas Aims and scope Submit manuscript

Abstract

A controlled branching process is a stochastic growth population model in which the number of individuals with reproductive capacity in each generation is given by a random control function. The purpose of the present work was to examine the Approximate Bayesian Computation sequential Monte Carlo method, and to propose appropriate summary statistics for these processes. The method’s success is shown to rely on this latter issue, and its accuracy is illustrated and compared with a “likelihood free” Markov chain Monte Carlo technique by means of a simulated example. How to extend the method to a controlled multitype branching process is also illustrated, and an application is made to model real data from the cell kinetics field. Both illustrations are developed using the R statistical software environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Beaumont, M.A., Cornuet, J.M., Marin, J.M., Robert, C.P.: Adaptive approximate bayesian computation. Biometrika 96(4), 983–990 (2009)

    MathSciNet  MATH  Google Scholar 

  2. Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate bayesian computation in population genetics. Genetic 162(4), 2025–2035 (2002)

    Google Scholar 

  3. Becker, N.: On parametric estimation for mortal branching processes. Biometrika 61(3), 393–399 (1974)

    MathSciNet  MATH  Google Scholar 

  4. Corral, A., García-Millán, R., Font-Clos, F.: Exact derivation of a finite-size scaling law and corrections to scaling in the geometric Galton–Watson process. PLoS One 11(9), e0161586 (2016)

    Google Scholar 

  5. Drovandi, C.C., Pettitt, A.N., McCutchan, R.A.: Exact and approximate Bayesian inference for low integer-valued time series models with intractable likelihoods. Bayesian Anal. 11(2), 325–352 (2016). https://doi.org/10.1214/15-BA950

    Article  MathSciNet  MATH  Google Scholar 

  6. Filippi, S., Barnes, C.P., Cornebise, J., Stumpf, M.P.H.: On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat. Appl. Genet. Mol. Biol. 12(1), 87–107 (2013). https://doi.org/10.1515/sagmb-2012-0069

    Article  MathSciNet  Google Scholar 

  7. Frazier, D.T., Robert, C.P., Rousseau, J.: Model misspecification in approximate Bayesian computation: consequences and diagnostics. J. R. Stat. Soc. Ser. B (Stat. Methodol.) https://doi.org/10.1111/rssb.12356,(2020)

  8. Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: Multivariate Normal and t Distributions (2020) R package version 1.1–0. https://CRAN.R-project.org/package=mvtnorm

  9. González, M., Gutiérrez, C., Martínez, R., Minuesa, C., del Puerto, I.: Bayesian analysis for controlled branching processes. In: I. del Puerto, M. González, C. Gutiérrez, R. Martínez, C. Minuesa, M. Molina, M. Mota, A. Ramos (eds.) Branching Processes and Their Applications, Lecture Notes in Statistics, vol. 219, pp. 185–205. Springer, Switzerland (2016)

  10. González, M., Gutiérrez, C., Martínez, R., del Puerto, I.: Bayesian inference for controlled branching processes through MCMC and ABC methodologies. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matemáticas 107(2), 459–473 (2013)

    MathSciNet  MATH  Google Scholar 

  11. González, M., Minuesa, C., del Puerto, I.: Maximum likelihood estimation and Expectation–Maximization algorithm for controlled branching processes. Comput. Stat. Data Anal. 93, 209–227 (2016)

    MathSciNet  MATH  Google Scholar 

  12. González, M., Minuesa, C., del Puerto, I.: Minimum disparity estimation in controlled branching processes. Electron. J. Stat. 11(1), 295–325 (2017)

    MathSciNet  MATH  Google Scholar 

  13. González, M., Minuesa, C., del Puerto, I., Vidyashankar, A.N.: Robust estimation in controlled branching processes: Bayesian estimators via disparities, 1–62 (2018) arXiv:1802.05917

  14. González, M., del Puerto, I., Yanev, G.P.: Controlled branching processes. ISTE Ltd, London, Wiley, Hoboken (2018)

    Google Scholar 

  15. Guttorp, P.: Statistical inference for branching processes. Wiley, New York (1991)

    MATH  Google Scholar 

  16. Guttorp, P., Perlman, M.D.: Predicting extinction or explosion in a Galton-Watson branching process with power series offspring distribution. J. Stat. Plan. Infer. 165, 193–215 (2015)

    MathSciNet  MATH  Google Scholar 

  17. Holgate, P., Lakhani, K.H.: Effect of offspring distribution on population survival. Bull. Math. Biophys. 29(4), 831–839 (1967)

    Google Scholar 

  18. Hyrien, O., Ambeskovic, I., Mayer-Proschel, M., Noble, M., Yakovlev, A.: Stochastic modeling of oligodendrocyte generation in cell culture: model validation with time-lapse data. Theor. Biol. Med. Model. 3(1), 21 (2006)

    Google Scholar 

  19. Lintusaari, J., Gutmann, M.U., Dutta, R., Kaski, S., Corander, J.: Fundamentals and recent developments in approximate Bayesian Computation. Syst. Biol. 66(1), e66–e68 (2017)

    Google Scholar 

  20. Martínez, R., Mota, M., del Puerto, I.: On asymptotic posterior normality for controlled branching processes. Statistics 43(4), 367–378 (2009)

    MathSciNet  MATH  Google Scholar 

  21. McKinley, T., Cook, A.R., Deardon, R.: Inference in epidemic models without likelihoods. Int. J. Biostat. 5(1), 24 (2009)

    MathSciNet  Google Scholar 

  22. Owen, J., Wilkinson, D.J., Gillespie, C.S.: Likelihood free inference for Markov processes: a comparison. Stat. Appl. Genet. Mol. Biol. 14(2), 189–209 (2015)

    MathSciNet  MATH  Google Scholar 

  23. Plummer, M., Best, N., Cowles, K., Vines, K.: CODA: convergence diagnosis and output analysis for MCMC. R News 6(1), 7–11 (2006)

    Google Scholar 

  24. Prangle, D.: Adapting the ABC distance function. Bayesian Anal. 12(1), 289–309 (2017)

    MathSciNet  MATH  Google Scholar 

  25. R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2020). http://www.R-project.org/

  26. Robert, C.P.: Approximate Bayesian Computation: A survey on recent results. In: R. Cools, D. Nuyens (eds.) Monte Carlo and Quasi-Monte Carlo Methods, Springer Proceedings in Mathematics & Statistics, vol. 163, pp. 185–205. Springer International Publishing (2016)

  27. Sevast’yanov, B.A., Zubkov, A.M.: Controlled branching processes. Theory Prob. Appl. 19(1), 15–25 (1974)

    MathSciNet  MATH  Google Scholar 

  28. Venables, W.N., Ripley, B.D.: Modern applied statistics with S. Statistics and computing, 4th edn. Springer, New York (2002)

    MATH  Google Scholar 

  29. Weiß, C.H.: Fully observed INAR(1) processes. J. Appl. Stat. 39(3), 581–598 (2012)

    MathSciNet  Google Scholar 

  30. Yakovlev, A.Y., Stoimenova, V.K., Yanev, N.M.: Branching processes as models of progenitor cell populations and estimation of the offspring distributions. J. Am. Stat. Assoc. 103(484), 1357–1366 (2008)

    MathSciNet  MATH  Google Scholar 

  31. Yanev, N.M.: Conditions for degeneracy of \(\phi \)-branching processes with random \(\phi \). Theory Prob. Appl. 20, 421–428 (1975)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Associate Editor and the Reviewers for providing valuable comments and suggestions which have improved this paper. This research was supported by the Ministerio de Educación, Cultura y Deporte (Grant FPU13/03213), the Ministerio de Economía y Competitividad (Grant MTM2015-70522-P), the Junta de Extremadura (Grant IB16099), and the Fondo Europeo de Desarrollo Regional.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inés del Puerto.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Supplementary material

Simulated data

The data for the simulated example presented in Sect. 4 are provided in Table 3. Recall that for the simulated CBP, which starts with \(Z_0=1\) individual, the reproduction law is a geometric distribution with parameter \(q=0.4\), and for each \(k\in \mathbb {N}_0\), the probability distribution of the control variable \(\phi _n(k)\) is a binomial distribution with parameters \(\xi (k)\) and \(\gamma =0.75\), with \(\xi (k)= k+\lfloor \log (k)\rfloor \), for each \(k\in \mathbb {N}\) and \(\xi (0)=0\).

Table 3 Simulated data, \(\widetilde{{\mathcal {Z}}}_{30}^{obs}\)

Summary statistics

In order to highlight the importance of the choice of the summary statistic in the output of ABC algorithms, we also implemented them removing one of the proposed algorithm’s coordinates, i.e., we considered the summary statistics:

$$\begin{aligned} {\mathcal {S}}_1(\widetilde{{\mathcal {Z}}}_n)&=\left( \frac{\sum _{i=1}^n Z_i}{\sum _{i=0}^{n-1} Z_i},\frac{\phi _{n-1}(Z_{n-1})}{Z_{n-1}}\right) ,\\ {\mathcal {S}}_2(\widetilde{{\mathcal {Z}}}_n)&=\left( \sum _{i=1}^n Z_i,\frac{\phi _{n-1}(Z_{n-1})}{Z_{n-1}}\right) ,\\ {\mathcal {S}}_3(\widetilde{{\mathcal {Z}}}_n)&=\left( \sum _{i=1}^n Z_i,\frac{\sum _{i=1}^n Z_i}{\sum _{i=0}^{n-1} Z_i}\right) . \end{aligned}$$

The results for the parameters \(\theta \), \(\gamma \), and \(\tau _m\) are presented in Tables 45, and 6, respectively. One observes from these tables that removing the number of progenitors in the final generation from the observed sample, and consequently from the summary statistic, i.e., \({\mathcal {S}}_3\), provides similar accuracy measures for \(\tau _m\) – a consequence of the first convergence in (3). However, the estimates of its factors separately, i.e., m (dependent on \(\theta \)) and \(\gamma \), get worse. There particularly stands out the less accurate estimate of the control parameter, \(\gamma \), with a very wide 95% HPD interval. If the total progeny component is removed, i.e., \({\mathcal {S}}_1\), the integrated squared error (ISE) and the Kullback-Leibler divergence between the posterior densities (KL) for three parameters, \(\theta \), \(\gamma \), and \(\tau _m\), increase significantly. For ISE and KL, the posterior density given by the Gibbs sampler algorithm was taken as the referent. For the latter parameter, the 95% HPD intervals do not contain the true value. Finally, removing the second coordinate in \({\mathcal {S}}\), i.e., \({\mathcal {S}}_2\), results in a worsening of the three accuracy measures for the three parameters considered.

Table 4 Summary of the estimates of the posterior density \(\pi (\theta |\widetilde{{\mathcal {Z}}}_{30})\) by the different methods. RMSE: the relative mean square error; ISE: the integrated squared error; KL: the Kullback-Leibler divergence between the posterior densities. For ISE and KL, the posterior density given by the Gibbs sampler algorithm was taken as the referent
Table 5 Summary of the estimates of the posterior density \(\pi (\gamma |\widetilde{{\mathcal {Z}}}_{30})\) by the different methods. RMSE: the relative mean square error; ISE: the integrated squared error; KL: the Kullback-Leibler divergence between the posterior densities. For ISE and KL, the posterior density given by the Gibbs sampler algorithm was taken as the referent
Table 6 Summary of the estimates of the posterior density \(\pi (\tau _m|\widetilde{{\mathcal {Z}}}_{30})\) by the different methods. RMSE: the relative mean square error; ISE: the integrated squared error; KL: the Kullback-Leibler divergence between the posterior densities. For ISE and KL, the posterior density given by the Gibbs sampler algorithm was taken as the referent
Table 7 Summary of the sensitivity analysis regarding the choice of the prior distribution for the posterior density \(\pi (\theta |\widetilde{{\mathcal {Z}}}_{30})\)
Table 8 Summary of the sensitivity analysis regarding the choice of the prior distribution for the posterior density \(\pi (\gamma |\widetilde{{\mathcal {Z}}}_{30})\)
Table 9 Summary of the sensitivity analysis regarding the choice of the offspring and control distributions in the SMC ABC algorithm with the summary statistic and post-processing method. A geometric distribution and a Poisson distribution were used for the reproduction law, and a binomial distribution, a Poisson distribution, and a negative binomial distribution for the control laws
Table 10 Summary of the sensitivity analysis regarding the choice of the offspring and control distributions in the SMC ABC algorithm with the summary statistic and post-processing method without knowledge of the function \(\xi (\cdot )\). A geometric distribution and a Poisson distribution were used for the reproduction law, and a binomial distribution, a Poisson distribution, and a negative binomial distribution for the control laws

Sensitivity analysis

We performed a sensitivity analysis regarding the choices of the prior distributions. Recall that both the offspring and the control parameters are probabilities, so that beta distributions seemed to be reasonable options as their prior distributions. In this analysis therefore, we considered different values for the shape parameters of beta distributions. The results for the Gibbs sampler algorithm and the SMC ABC algorithm with the local lineal regression adjustment using the metric \(\rho _1\) are summarized in Tables 7 and 8 for the posterior densities \(\pi (\theta |\widetilde{{\mathcal {Z}}}_{30})\) and \(\pi (\gamma |\widetilde{{\mathcal {Z}}}_{30})\), respectively. They indicate that the estimation of the posterior densities is not sensitive to the choices of the prior distributions.

It is worth mentioning again that the ABC method relies on having ease of sampling from the model. In our implementation, this implies knowledge of the parametric families. An interesting issue is to study the performance of the algorithms when one knows that those distributions can be parametrized with a one-dimensional parameter, but the kind of parametric distribution one should use to that end is unknown. For this purpose, we shall here present a summary of the results of the sensitivity analysis regarding the choice of the parametric families for the offspring and control distributions for three different CBPs. Apart from the previous example, we simulated 30 generations of two additional CBPs starting with one individual: the first of them has a geometric distribution with parameter \(\theta =0.918\) as the offspring distribution and the control variables \(\phi _n(k)\) follow binomial distributions with parameters \(\xi (k)\) and \(\gamma =0.1\), and, in the second, the offspring distribution is a geometric distribution with parameter \(\theta =0.556\) and it has control variables \(\phi _n(k)\) following a binomial distribution with parameters \(\xi (k)\) and \(\gamma =0.9\). We proposed two offspring laws (geometric and Poisson distributions) and three control laws (binomial, Poisson, and negative binomial distributions), and ran the SMC ABC algorithm for the posterior densities \(\pi (m|\widetilde{{\mathcal {Z}}}_{30})\), \(\pi (\tau |\widetilde{{\mathcal {Z}}}_{30})\), and \(\pi (\tau _m|\widetilde{{\mathcal {Z}}}_{30})\) in each of these six situations for the three examples considered. We opted for these three parameters since it is more reasonable to compare the results based on different distributions in terms of the offspring mean, the parameter \(\tau \), and the asymptotic mean growth rate of the process, which are the stable parameters of the process.

The results are given in Table 9. They show that, for the different models used for simulating, the method usually identifies relatively well the offspring mean, the parameter \(\tau \), and the asymptotic mean growth rate, which is also indicative of the goodness of our summary statistic. Particularly noteworthy is the case of the second simulated model when a Poisson distribution is taken for the control laws. The algorithm gives a good estimate for the posterior density of the parameter \(\tau \), but for that of m (and consequently, of \(\tau _m\)) not as good as when a binomial or negative binomial control law distribution is assumed. This is related to the magnitude of the parameter \(\theta =0.918\). Indeed, when we used the geometric distribution in simulating the model we also estimated the posterior density of \(\theta \), resulting in a mean value of 0.9079 and variance of \(2.0629\cdot 10^{-5}\), which seems to be a good estimate of that posterior density. However, the small bias observed in this estimate was amplified when we estimated m due to the fact that \(m=\theta (1-\theta )^{-1}\).

Similar results were obtained when one is unaware of the existence of the function \(\xi (\cdot )\). To examine that problem in this example, we considered the three models described above, and repeated the study by applying the same ABC method for the posterior distributions \(\pi (m|\widetilde{{\mathcal {Z}}}_{30})\), \(\pi (\tau |\widetilde{{\mathcal {Z}}}_{30})\), and \(\pi (\tau _m|\widetilde{{\mathcal {Z}}}_{30})\). Again, we used geometric and Poisson offspring distributions and binomial, Poisson, and negative binomial control law distributions. The results are presented in Table 10, where one observes that there is no significant difference with respect to those in Table 9 except for a slight increase in the RMSE in most cases. This similarity is due to the fact \(\varepsilon (k)=(k+\lfloor \log (k)\rfloor )\gamma \sim k\gamma \), as \(k\rightarrow \infty \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

González, M., Martínez, R., Minuesa, C. et al. Approximate Bayesian computation in controlled branching processes: the role of summary statistics. RACSAM 114, 116 (2020). https://doi.org/10.1007/s13398-020-00839-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13398-020-00839-x

Keywords

Mathematics Subject Classification

Navigation