Approximate Bayesian computation in controlled branching processes: the role of summary statistics

González, Miguel; Martínez, Rodrigo; Minuesa, Carmen; del Puerto, Inés

doi:10.1007/s13398-020-00839-x

Approximate Bayesian computation in controlled branching processes: the role of summary statistics

Original Paper
Published: 20 April 2020

Volume 114, article number 116, (2020)
Cite this article

Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matemáticas Aims and scope Submit manuscript

242 Accesses
1 Citation
Explore all metrics

Abstract

A controlled branching process is a stochastic growth population model in which the number of individuals with reproductive capacity in each generation is given by a random control function. The purpose of the present work was to examine the Approximate Bayesian Computation sequential Monte Carlo method, and to propose appropriate summary statistics for these processes. The method’s success is shown to rely on this latter issue, and its accuracy is illustrated and compared with a “likelihood free” Markov chain Monte Carlo technique by means of a simulated example. How to extend the method to a controlled multitype branching process is also illustrated, and an application is made to model real data from the cell kinetics field. Both illustrations are developed using the R statistical software environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Bayesian computation approach on the maximal offspring and parameters in controlled branching processes

Article Open access 07 July 2022

Miguel González, Carmen Minuesa & Inés del Puerto

EM Algorithm for Estimation of the Offspring Probabilities in Some Branching Models

Notes on large deviations for branching processes indexed by a Poisson process

Article 01 January 2020

Zhenlong Gao

References

Beaumont, M.A., Cornuet, J.M., Marin, J.M., Robert, C.P.: Adaptive approximate bayesian computation. Biometrika 96(4), 983–990 (2009)
MathSciNet MATH Google Scholar
Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate bayesian computation in population genetics. Genetic 162(4), 2025–2035 (2002)
Google Scholar
Becker, N.: On parametric estimation for mortal branching processes. Biometrika 61(3), 393–399 (1974)
MathSciNet MATH Google Scholar
Corral, A., García-Millán, R., Font-Clos, F.: Exact derivation of a finite-size scaling law and corrections to scaling in the geometric Galton–Watson process. PLoS One 11(9), e0161586 (2016)
Google Scholar
Drovandi, C.C., Pettitt, A.N., McCutchan, R.A.: Exact and approximate Bayesian inference for low integer-valued time series models with intractable likelihoods. Bayesian Anal. 11(2), 325–352 (2016). https://doi.org/10.1214/15-BA950
Article MathSciNet MATH Google Scholar
Filippi, S., Barnes, C.P., Cornebise, J., Stumpf, M.P.H.: On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat. Appl. Genet. Mol. Biol. 12(1), 87–107 (2013). https://doi.org/10.1515/sagmb-2012-0069
Article MathSciNet Google Scholar
Frazier, D.T., Robert, C.P., Rousseau, J.: Model misspecification in approximate Bayesian computation: consequences and diagnostics. J. R. Stat. Soc. Ser. B (Stat. Methodol.) https://doi.org/10.1111/rssb.12356,(2020)
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: Multivariate Normal and t Distributions (2020) R package version 1.1–0. https://CRAN.R-project.org/package=mvtnorm
González, M., Gutiérrez, C., Martínez, R., Minuesa, C., del Puerto, I.: Bayesian analysis for controlled branching processes. In: I. del Puerto, M. González, C. Gutiérrez, R. Martínez, C. Minuesa, M. Molina, M. Mota, A. Ramos (eds.) Branching Processes and Their Applications, Lecture Notes in Statistics, vol. 219, pp. 185–205. Springer, Switzerland (2016)
González, M., Gutiérrez, C., Martínez, R., del Puerto, I.: Bayesian inference for controlled branching processes through MCMC and ABC methodologies. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matemáticas 107(2), 459–473 (2013)
MathSciNet MATH Google Scholar
González, M., Minuesa, C., del Puerto, I.: Maximum likelihood estimation and Expectation–Maximization algorithm for controlled branching processes. Comput. Stat. Data Anal. 93, 209–227 (2016)
MathSciNet MATH Google Scholar
González, M., Minuesa, C., del Puerto, I.: Minimum disparity estimation in controlled branching processes. Electron. J. Stat. 11(1), 295–325 (2017)
MathSciNet MATH Google Scholar
González, M., Minuesa, C., del Puerto, I., Vidyashankar, A.N.: Robust estimation in controlled branching processes: Bayesian estimators via disparities, 1–62 (2018) arXiv:1802.05917
González, M., del Puerto, I., Yanev, G.P.: Controlled branching processes. ISTE Ltd, London, Wiley, Hoboken (2018)
Google Scholar
Guttorp, P.: Statistical inference for branching processes. Wiley, New York (1991)
MATH Google Scholar
Guttorp, P., Perlman, M.D.: Predicting extinction or explosion in a Galton-Watson branching process with power series offspring distribution. J. Stat. Plan. Infer. 165, 193–215 (2015)
MathSciNet MATH Google Scholar
Holgate, P., Lakhani, K.H.: Effect of offspring distribution on population survival. Bull. Math. Biophys. 29(4), 831–839 (1967)
Google Scholar
Hyrien, O., Ambeskovic, I., Mayer-Proschel, M., Noble, M., Yakovlev, A.: Stochastic modeling of oligodendrocyte generation in cell culture: model validation with time-lapse data. Theor. Biol. Med. Model. 3(1), 21 (2006)
Google Scholar
Lintusaari, J., Gutmann, M.U., Dutta, R., Kaski, S., Corander, J.: Fundamentals and recent developments in approximate Bayesian Computation. Syst. Biol. 66(1), e66–e68 (2017)
Google Scholar
Martínez, R., Mota, M., del Puerto, I.: On asymptotic posterior normality for controlled branching processes. Statistics 43(4), 367–378 (2009)
MathSciNet MATH Google Scholar
McKinley, T., Cook, A.R., Deardon, R.: Inference in epidemic models without likelihoods. Int. J. Biostat. 5(1), 24 (2009)
MathSciNet Google Scholar
Owen, J., Wilkinson, D.J., Gillespie, C.S.: Likelihood free inference for Markov processes: a comparison. Stat. Appl. Genet. Mol. Biol. 14(2), 189–209 (2015)
MathSciNet MATH Google Scholar
Plummer, M., Best, N., Cowles, K., Vines, K.: CODA: convergence diagnosis and output analysis for MCMC. R News 6(1), 7–11 (2006)
Google Scholar
Prangle, D.: Adapting the ABC distance function. Bayesian Anal. 12(1), 289–309 (2017)
MathSciNet MATH Google Scholar
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2020). http://www.R-project.org/
Robert, C.P.: Approximate Bayesian Computation: A survey on recent results. In: R. Cools, D. Nuyens (eds.) Monte Carlo and Quasi-Monte Carlo Methods, Springer Proceedings in Mathematics & Statistics, vol. 163, pp. 185–205. Springer International Publishing (2016)
Sevast’yanov, B.A., Zubkov, A.M.: Controlled branching processes. Theory Prob. Appl. 19(1), 15–25 (1974)
MathSciNet MATH Google Scholar
Venables, W.N., Ripley, B.D.: Modern applied statistics with S. Statistics and computing, 4th edn. Springer, New York (2002)
MATH Google Scholar
Weiß, C.H.: Fully observed INAR(1) processes. J. Appl. Stat. 39(3), 581–598 (2012)
MathSciNet Google Scholar
Yakovlev, A.Y., Stoimenova, V.K., Yanev, N.M.: Branching processes as models of progenitor cell populations and estimation of the offspring distributions. J. Am. Stat. Assoc. 103(484), 1357–1366 (2008)
MathSciNet MATH Google Scholar
Yanev, N.M.: Conditions for degeneracy of $\phi $-branching processes with random $\phi $. Theory Prob. Appl. 20, 421–428 (1975)
MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the Associate Editor and the Reviewers for providing valuable comments and suggestions which have improved this paper. This research was supported by the Ministerio de Educación, Cultura y Deporte (Grant FPU13/03213), the Ministerio de Economía y Competitividad (Grant MTM2015-70522-P), the Junta de Extremadura (Grant IB16099), and the Fondo Europeo de Desarrollo Regional.

Author information

Authors and Affiliations

Department of Mathematics, Faculty of Sciences, and Instituto de Computación Científica Avanzada, University of Extremadura, 06006, Badajoz, Spain
Miguel González & Inés del Puerto
Department of Mathematics, University Centre of Plasencia, and Instituto de Computación Científica Avanzada, University of Extremadura, 10600, Plasencia, Spain
Rodrigo Martínez
Department of Mathematics, Faculty of Sciences, Autonomous University of Madrid, Cuidad Universitaria de Cantoblanco, 28049, Madrid, Spain
Carmen Minuesa

Authors

Miguel González
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Minuesa
View author publications
You can also search for this author in PubMed Google Scholar
Inés del Puerto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Inés del Puerto.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Supplementary material

Simulated data

The data for the simulated example presented in Sect. 4 are provided in Table 3. Recall that for the simulated CBP, which starts with $Z_0=1$ individual, the reproduction law is a geometric distribution with parameter $q=0.4$, and for each $k\in \mathbb {N}_0$, the probability distribution of the control variable $\phi _n(k)$ is a binomial distribution with parameters $\xi (k)$ and $\gamma =0.75$, with $\xi (k)= k+\lfloor \log (k)\rfloor $, for each $k\in \mathbb {N}$ and $\xi (0)=0$.

Table 3 Simulated data, $\widetilde{{\mathcal {Z}}}_{30}^{obs}$

Full size table

Summary statistics

In order to highlight the importance of the choice of the summary statistic in the output of ABC algorithms, we also implemented them removing one of the proposed algorithm’s coordinates, i.e., we considered the summary statistics:

$$\begin{aligned} {\mathcal {S}}_1(\widetilde{{\mathcal {Z}}}_n)&=\left( \frac{\sum _{i=1}^n Z_i}{\sum _{i=0}^{n-1} Z_i},\frac{\phi _{n-1}(Z_{n-1})}{Z_{n-1}}\right) ,\\ {\mathcal {S}}_2(\widetilde{{\mathcal {Z}}}_n)&=\left( \sum _{i=1}^n Z_i,\frac{\phi _{n-1}(Z_{n-1})}{Z_{n-1}}\right) ,\\ {\mathcal {S}}_3(\widetilde{{\mathcal {Z}}}_n)&=\left( \sum _{i=1}^n Z_i,\frac{\sum _{i=1}^n Z_i}{\sum _{i=0}^{n-1} Z_i}\right) . \end{aligned}$$

The results for the parameters $\theta $, $\gamma $, and $\tau _m$ are presented in Tables 4, 5, and 6, respectively. One observes from these tables that removing the number of progenitors in the final generation from the observed sample, and consequently from the summary statistic, i.e., ${\mathcal {S}}_3$, provides similar accuracy measures for $\tau _m$ – a consequence of the first convergence in (3). However, the estimates of its factors separately, i.e., m (dependent on $\theta $) and $\gamma $, get worse. There particularly stands out the less accurate estimate of the control parameter, $\gamma $, with a very wide 95% HPD interval. If the total progeny component is removed, i.e., ${\mathcal {S}}_1$, the integrated squared error (ISE) and the Kullback-Leibler divergence between the posterior densities (KL) for three parameters, $\theta $, $\gamma $, and $\tau _m$, increase significantly. For ISE and KL, the posterior density given by the Gibbs sampler algorithm was taken as the referent. For the latter parameter, the 95% HPD intervals do not contain the true value. Finally, removing the second coordinate in ${\mathcal {S}}$, i.e., ${\mathcal {S}}_2$, results in a worsening of the three accuracy measures for the three parameters considered.

Table 4 Summary of the estimates of the posterior density $\pi (\theta |\widetilde{{\mathcal {Z}}}_{30})$ by the different methods. RMSE: the relative mean square error; ISE: the integrated squared error; KL: the Kullback-Leibler divergence between the posterior densities. For ISE and KL, the posterior density given by the Gibbs sampler algorithm was taken as the referent

Full size table

Table 5 Summary of the estimates of the posterior density $\pi (\gamma |\widetilde{{\mathcal {Z}}}_{30})$ by the different methods. RMSE: the relative mean square error; ISE: the integrated squared error; KL: the Kullback-Leibler divergence between the posterior densities. For ISE and KL, the posterior density given by the Gibbs sampler algorithm was taken as the referent

Full size table

Table 6 Summary of the estimates of the posterior density $\pi (\tau _m|\widetilde{{\mathcal {Z}}}_{30})$ by the different methods. RMSE: the relative mean square error; ISE: the integrated squared error; KL: the Kullback-Leibler divergence between the posterior densities. For ISE and KL, the posterior density given by the Gibbs sampler algorithm was taken as the referent

Full size table

Table 7 Summary of the sensitivity analysis regarding the choice of the prior distribution for the posterior density $\pi (\theta |\widetilde{{\mathcal {Z}}}_{30})$

Full size table

Table 8 Summary of the sensitivity analysis regarding the choice of the prior distribution for the posterior density $\pi (\gamma |\widetilde{{\mathcal {Z}}}_{30})$

Full size table

Table 9 Summary of the sensitivity analysis regarding the choice of the offspring and control distributions in the SMC ABC algorithm with the summary statistic and post-processing method. A geometric distribution and a Poisson distribution were used for the reproduction law, and a binomial distribution, a Poisson distribution, and a negative binomial distribution for the control laws

Full size table

Table 10 Summary of the sensitivity analysis regarding the choice of the offspring and control distributions in the SMC ABC algorithm with the summary statistic and post-processing method without knowledge of the function $\xi (\cdot )$. A geometric distribution and a Poisson distribution were used for the reproduction law, and a binomial distribution, a Poisson distribution, and a negative binomial distribution for the control laws

Full size table

Sensitivity analysis

We performed a sensitivity analysis regarding the choices of the prior distributions. Recall that both the offspring and the control parameters are probabilities, so that beta distributions seemed to be reasonable options as their prior distributions. In this analysis therefore, we considered different values for the shape parameters of beta distributions. The results for the Gibbs sampler algorithm and the SMC ABC algorithm with the local lineal regression adjustment using the metric $\rho _1$ are summarized in Tables 7 and 8 for the posterior densities $\pi (\theta |\widetilde{{\mathcal {Z}}}_{30})$ and $\pi (\gamma |\widetilde{{\mathcal {Z}}}_{30})$, respectively. They indicate that the estimation of the posterior densities is not sensitive to the choices of the prior distributions.

It is worth mentioning again that the ABC method relies on having ease of sampling from the model. In our implementation, this implies knowledge of the parametric families. An interesting issue is to study the performance of the algorithms when one knows that those distributions can be parametrized with a one-dimensional parameter, but the kind of parametric distribution one should use to that end is unknown. For this purpose, we shall here present a summary of the results of the sensitivity analysis regarding the choice of the parametric families for the offspring and control distributions for three different CBPs. Apart from the previous example, we simulated 30 generations of two additional CBPs starting with one individual: the first of them has a geometric distribution with parameter $\theta =0.918$ as the offspring distribution and the control variables $\phi _n(k)$ follow binomial distributions with parameters $\xi (k)$ and $\gamma =0.1$, and, in the second, the offspring distribution is a geometric distribution with parameter $\theta =0.556$ and it has control variables $\phi _n(k)$ following a binomial distribution with parameters $\xi (k)$ and $\gamma =0.9$. We proposed two offspring laws (geometric and Poisson distributions) and three control laws (binomial, Poisson, and negative binomial distributions), and ran the SMC ABC algorithm for the posterior densities $\pi (m|\widetilde{{\mathcal {Z}}}_{30})$, $\pi (\tau |\widetilde{{\mathcal {Z}}}_{30})$, and $\pi (\tau _m|\widetilde{{\mathcal {Z}}}_{30})$ in each of these six situations for the three examples considered. We opted for these three parameters since it is more reasonable to compare the results based on different distributions in terms of the offspring mean, the parameter $\tau $, and the asymptotic mean growth rate of the process, which are the stable parameters of the process.

The results are given in Table 9. They show that, for the different models used for simulating, the method usually identifies relatively well the offspring mean, the parameter $\tau $, and the asymptotic mean growth rate, which is also indicative of the goodness of our summary statistic. Particularly noteworthy is the case of the second simulated model when a Poisson distribution is taken for the control laws. The algorithm gives a good estimate for the posterior density of the parameter $\tau $, but for that of m (and consequently, of $\tau _m$) not as good as when a binomial or negative binomial control law distribution is assumed. This is related to the magnitude of the parameter $\theta =0.918$. Indeed, when we used the geometric distribution in simulating the model we also estimated the posterior density of $\theta $, resulting in a mean value of 0.9079 and variance of $2.0629\cdot 10^{-5}$, which seems to be a good estimate of that posterior density. However, the small bias observed in this estimate was amplified when we estimated m due to the fact that $m=\theta (1-\theta )^{-1}$.

Similar results were obtained when one is unaware of the existence of the function $\xi (\cdot )$. To examine that problem in this example, we considered the three models described above, and repeated the study by applying the same ABC method for the posterior distributions $\pi (m|\widetilde{{\mathcal {Z}}}_{30})$, $\pi (\tau |\widetilde{{\mathcal {Z}}}_{30})$, and $\pi (\tau _m|\widetilde{{\mathcal {Z}}}_{30})$. Again, we used geometric and Poisson offspring distributions and binomial, Poisson, and negative binomial control law distributions. The results are presented in Table 10, where one observes that there is no significant difference with respect to those in Table 9 except for a slight increase in the RMSE in most cases. This similarity is due to the fact $\varepsilon (k)=(k+\lfloor \log (k)\rfloor )\gamma \sim k\gamma $, as $k\rightarrow \infty $.

Rights and permissions

Reprints and permissions

About this article

Cite this article

González, M., Martínez, R., Minuesa, C. et al. Approximate Bayesian computation in controlled branching processes: the role of summary statistics. RACSAM 114, 116 (2020). https://doi.org/10.1007/s13398-020-00839-x

Download citation

Received: 13 July 2019
Accepted: 23 March 2020
Published: 20 April 2020
DOI: https://doi.org/10.1007/s13398-020-00839-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate Bayesian computation in controlled branching processes: the role of summary statistics

Abstract

Access this article

Similar content being viewed by others

Approximate Bayesian computation approach on the maximal offspring and parameters in controlled branching processes

EM Algorithm for Estimation of the Offspring Probabilities in Some Branching Models

Notes on large deviations for branching processes indexed by a Poisson process

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Supplementary material

Simulated data

Summary statistics

Sensitivity analysis

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Approximate Bayesian computation in controlled branching processes: the role of summary statistics

Abstract

Access this article

Similar content being viewed by others

Approximate Bayesian computation approach on the maximal offspring and parameters in controlled branching processes

EM Algorithm for Estimation of the Offspring Probabilities in Some Branching Models

Notes on large deviations for branching processes indexed by a Poisson process

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Supplementary material

Simulated data

Summary statistics

Sensitivity analysis

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation