Abstract
A controlled branching process is a stochastic growth population model in which the number of individuals with reproductive capacity in each generation is given by a random control function. The purpose of the present work was to examine the Approximate Bayesian Computation sequential Monte Carlo method, and to propose appropriate summary statistics for these processes. The method’s success is shown to rely on this latter issue, and its accuracy is illustrated and compared with a “likelihood free” Markov chain Monte Carlo technique by means of a simulated example. How to extend the method to a controlled multitype branching process is also illustrated, and an application is made to model real data from the cell kinetics field. Both illustrations are developed using the R statistical software environment.
Similar content being viewed by others
References
Beaumont, M.A., Cornuet, J.M., Marin, J.M., Robert, C.P.: Adaptive approximate bayesian computation. Biometrika 96(4), 983–990 (2009)
Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate bayesian computation in population genetics. Genetic 162(4), 2025–2035 (2002)
Becker, N.: On parametric estimation for mortal branching processes. Biometrika 61(3), 393–399 (1974)
Corral, A., García-Millán, R., Font-Clos, F.: Exact derivation of a finite-size scaling law and corrections to scaling in the geometric Galton–Watson process. PLoS One 11(9), e0161586 (2016)
Drovandi, C.C., Pettitt, A.N., McCutchan, R.A.: Exact and approximate Bayesian inference for low integer-valued time series models with intractable likelihoods. Bayesian Anal. 11(2), 325–352 (2016). https://doi.org/10.1214/15-BA950
Filippi, S., Barnes, C.P., Cornebise, J., Stumpf, M.P.H.: On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat. Appl. Genet. Mol. Biol. 12(1), 87–107 (2013). https://doi.org/10.1515/sagmb-2012-0069
Frazier, D.T., Robert, C.P., Rousseau, J.: Model misspecification in approximate Bayesian computation: consequences and diagnostics. J. R. Stat. Soc. Ser. B (Stat. Methodol.) https://doi.org/10.1111/rssb.12356,(2020)
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: Multivariate Normal and t Distributions (2020) R package version 1.1–0. https://CRAN.R-project.org/package=mvtnorm
González, M., Gutiérrez, C., Martínez, R., Minuesa, C., del Puerto, I.: Bayesian analysis for controlled branching processes. In: I. del Puerto, M. González, C. Gutiérrez, R. Martínez, C. Minuesa, M. Molina, M. Mota, A. Ramos (eds.) Branching Processes and Their Applications, Lecture Notes in Statistics, vol. 219, pp. 185–205. Springer, Switzerland (2016)
González, M., Gutiérrez, C., Martínez, R., del Puerto, I.: Bayesian inference for controlled branching processes through MCMC and ABC methodologies. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matemáticas 107(2), 459–473 (2013)
González, M., Minuesa, C., del Puerto, I.: Maximum likelihood estimation and Expectation–Maximization algorithm for controlled branching processes. Comput. Stat. Data Anal. 93, 209–227 (2016)
González, M., Minuesa, C., del Puerto, I.: Minimum disparity estimation in controlled branching processes. Electron. J. Stat. 11(1), 295–325 (2017)
González, M., Minuesa, C., del Puerto, I., Vidyashankar, A.N.: Robust estimation in controlled branching processes: Bayesian estimators via disparities, 1–62 (2018) arXiv:1802.05917
González, M., del Puerto, I., Yanev, G.P.: Controlled branching processes. ISTE Ltd, London, Wiley, Hoboken (2018)
Guttorp, P.: Statistical inference for branching processes. Wiley, New York (1991)
Guttorp, P., Perlman, M.D.: Predicting extinction or explosion in a Galton-Watson branching process with power series offspring distribution. J. Stat. Plan. Infer. 165, 193–215 (2015)
Holgate, P., Lakhani, K.H.: Effect of offspring distribution on population survival. Bull. Math. Biophys. 29(4), 831–839 (1967)
Hyrien, O., Ambeskovic, I., Mayer-Proschel, M., Noble, M., Yakovlev, A.: Stochastic modeling of oligodendrocyte generation in cell culture: model validation with time-lapse data. Theor. Biol. Med. Model. 3(1), 21 (2006)
Lintusaari, J., Gutmann, M.U., Dutta, R., Kaski, S., Corander, J.: Fundamentals and recent developments in approximate Bayesian Computation. Syst. Biol. 66(1), e66–e68 (2017)
Martínez, R., Mota, M., del Puerto, I.: On asymptotic posterior normality for controlled branching processes. Statistics 43(4), 367–378 (2009)
McKinley, T., Cook, A.R., Deardon, R.: Inference in epidemic models without likelihoods. Int. J. Biostat. 5(1), 24 (2009)
Owen, J., Wilkinson, D.J., Gillespie, C.S.: Likelihood free inference for Markov processes: a comparison. Stat. Appl. Genet. Mol. Biol. 14(2), 189–209 (2015)
Plummer, M., Best, N., Cowles, K., Vines, K.: CODA: convergence diagnosis and output analysis for MCMC. R News 6(1), 7–11 (2006)
Prangle, D.: Adapting the ABC distance function. Bayesian Anal. 12(1), 289–309 (2017)
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2020). http://www.R-project.org/
Robert, C.P.: Approximate Bayesian Computation: A survey on recent results. In: R. Cools, D. Nuyens (eds.) Monte Carlo and Quasi-Monte Carlo Methods, Springer Proceedings in Mathematics & Statistics, vol. 163, pp. 185–205. Springer International Publishing (2016)
Sevast’yanov, B.A., Zubkov, A.M.: Controlled branching processes. Theory Prob. Appl. 19(1), 15–25 (1974)
Venables, W.N., Ripley, B.D.: Modern applied statistics with S. Statistics and computing, 4th edn. Springer, New York (2002)
Weiß, C.H.: Fully observed INAR(1) processes. J. Appl. Stat. 39(3), 581–598 (2012)
Yakovlev, A.Y., Stoimenova, V.K., Yanev, N.M.: Branching processes as models of progenitor cell populations and estimation of the offspring distributions. J. Am. Stat. Assoc. 103(484), 1357–1366 (2008)
Yanev, N.M.: Conditions for degeneracy of \(\phi \)-branching processes with random \(\phi \). Theory Prob. Appl. 20, 421–428 (1975)
Acknowledgements
The authors would like to thank the Associate Editor and the Reviewers for providing valuable comments and suggestions which have improved this paper. This research was supported by the Ministerio de Educación, Cultura y Deporte (Grant FPU13/03213), the Ministerio de Economía y Competitividad (Grant MTM2015-70522-P), the Junta de Extremadura (Grant IB16099), and the Fondo Europeo de Desarrollo Regional.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Supplementary material
Simulated data
The data for the simulated example presented in Sect. 4 are provided in Table 3. Recall that for the simulated CBP, which starts with \(Z_0=1\) individual, the reproduction law is a geometric distribution with parameter \(q=0.4\), and for each \(k\in \mathbb {N}_0\), the probability distribution of the control variable \(\phi _n(k)\) is a binomial distribution with parameters \(\xi (k)\) and \(\gamma =0.75\), with \(\xi (k)= k+\lfloor \log (k)\rfloor \), for each \(k\in \mathbb {N}\) and \(\xi (0)=0\).
Summary statistics
In order to highlight the importance of the choice of the summary statistic in the output of ABC algorithms, we also implemented them removing one of the proposed algorithm’s coordinates, i.e., we considered the summary statistics:
The results for the parameters \(\theta \), \(\gamma \), and \(\tau _m\) are presented in Tables 4, 5, and 6, respectively. One observes from these tables that removing the number of progenitors in the final generation from the observed sample, and consequently from the summary statistic, i.e., \({\mathcal {S}}_3\), provides similar accuracy measures for \(\tau _m\) – a consequence of the first convergence in (3). However, the estimates of its factors separately, i.e., m (dependent on \(\theta \)) and \(\gamma \), get worse. There particularly stands out the less accurate estimate of the control parameter, \(\gamma \), with a very wide 95% HPD interval. If the total progeny component is removed, i.e., \({\mathcal {S}}_1\), the integrated squared error (ISE) and the Kullback-Leibler divergence between the posterior densities (KL) for three parameters, \(\theta \), \(\gamma \), and \(\tau _m\), increase significantly. For ISE and KL, the posterior density given by the Gibbs sampler algorithm was taken as the referent. For the latter parameter, the 95% HPD intervals do not contain the true value. Finally, removing the second coordinate in \({\mathcal {S}}\), i.e., \({\mathcal {S}}_2\), results in a worsening of the three accuracy measures for the three parameters considered.
Sensitivity analysis
We performed a sensitivity analysis regarding the choices of the prior distributions. Recall that both the offspring and the control parameters are probabilities, so that beta distributions seemed to be reasonable options as their prior distributions. In this analysis therefore, we considered different values for the shape parameters of beta distributions. The results for the Gibbs sampler algorithm and the SMC ABC algorithm with the local lineal regression adjustment using the metric \(\rho _1\) are summarized in Tables 7 and 8 for the posterior densities \(\pi (\theta |\widetilde{{\mathcal {Z}}}_{30})\) and \(\pi (\gamma |\widetilde{{\mathcal {Z}}}_{30})\), respectively. They indicate that the estimation of the posterior densities is not sensitive to the choices of the prior distributions.
It is worth mentioning again that the ABC method relies on having ease of sampling from the model. In our implementation, this implies knowledge of the parametric families. An interesting issue is to study the performance of the algorithms when one knows that those distributions can be parametrized with a one-dimensional parameter, but the kind of parametric distribution one should use to that end is unknown. For this purpose, we shall here present a summary of the results of the sensitivity analysis regarding the choice of the parametric families for the offspring and control distributions for three different CBPs. Apart from the previous example, we simulated 30 generations of two additional CBPs starting with one individual: the first of them has a geometric distribution with parameter \(\theta =0.918\) as the offspring distribution and the control variables \(\phi _n(k)\) follow binomial distributions with parameters \(\xi (k)\) and \(\gamma =0.1\), and, in the second, the offspring distribution is a geometric distribution with parameter \(\theta =0.556\) and it has control variables \(\phi _n(k)\) following a binomial distribution with parameters \(\xi (k)\) and \(\gamma =0.9\). We proposed two offspring laws (geometric and Poisson distributions) and three control laws (binomial, Poisson, and negative binomial distributions), and ran the SMC ABC algorithm for the posterior densities \(\pi (m|\widetilde{{\mathcal {Z}}}_{30})\), \(\pi (\tau |\widetilde{{\mathcal {Z}}}_{30})\), and \(\pi (\tau _m|\widetilde{{\mathcal {Z}}}_{30})\) in each of these six situations for the three examples considered. We opted for these three parameters since it is more reasonable to compare the results based on different distributions in terms of the offspring mean, the parameter \(\tau \), and the asymptotic mean growth rate of the process, which are the stable parameters of the process.
The results are given in Table 9. They show that, for the different models used for simulating, the method usually identifies relatively well the offspring mean, the parameter \(\tau \), and the asymptotic mean growth rate, which is also indicative of the goodness of our summary statistic. Particularly noteworthy is the case of the second simulated model when a Poisson distribution is taken for the control laws. The algorithm gives a good estimate for the posterior density of the parameter \(\tau \), but for that of m (and consequently, of \(\tau _m\)) not as good as when a binomial or negative binomial control law distribution is assumed. This is related to the magnitude of the parameter \(\theta =0.918\). Indeed, when we used the geometric distribution in simulating the model we also estimated the posterior density of \(\theta \), resulting in a mean value of 0.9079 and variance of \(2.0629\cdot 10^{-5}\), which seems to be a good estimate of that posterior density. However, the small bias observed in this estimate was amplified when we estimated m due to the fact that \(m=\theta (1-\theta )^{-1}\).
Similar results were obtained when one is unaware of the existence of the function \(\xi (\cdot )\). To examine that problem in this example, we considered the three models described above, and repeated the study by applying the same ABC method for the posterior distributions \(\pi (m|\widetilde{{\mathcal {Z}}}_{30})\), \(\pi (\tau |\widetilde{{\mathcal {Z}}}_{30})\), and \(\pi (\tau _m|\widetilde{{\mathcal {Z}}}_{30})\). Again, we used geometric and Poisson offspring distributions and binomial, Poisson, and negative binomial control law distributions. The results are presented in Table 10, where one observes that there is no significant difference with respect to those in Table 9 except for a slight increase in the RMSE in most cases. This similarity is due to the fact \(\varepsilon (k)=(k+\lfloor \log (k)\rfloor )\gamma \sim k\gamma \), as \(k\rightarrow \infty \).
Rights and permissions
About this article
Cite this article
González, M., Martínez, R., Minuesa, C. et al. Approximate Bayesian computation in controlled branching processes: the role of summary statistics. RACSAM 114, 116 (2020). https://doi.org/10.1007/s13398-020-00839-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13398-020-00839-x