Abstract
In this paper we present a weighted mixture distribution component counts (MDCC) approach for estimating total number of species. The proposed method combines conditional estimates of component counts from several candidate mixture distributions and uses bootstrap for confidence interval estimation. The distribution specification is flexible and can be adjusted to suit a variety of datasets. Smoothing techniques can also be incorporated to improve modeling of sparse data. The method is tested by a simulation study and applied to two microbiome datasets for illustration. Simulation results indicate improved bias, mean squared error and confidence interval coverage relative to comparison methods, as well as robustness to underlying data structure.
Similar content being viewed by others
Data availability
Office dust microbiome data is available from Qiita (http://qiita.microbio.me, Study ID: 10423). Plant microbiome data is available from Dryad (https://datadryad.org/stash/dataset/doi:10.5061/dryad.g60r3).
References
Bunge J, Barger K (2008) Parametric models for estimating the number of classes. Biom J 50:971–982
Bunge J, Fitzpatrick M (1993) Estimating the number of species: a review. J Am Stat Assoc 88:364–373
Bunge J, Woodard L, Böhning D, Foster JA, Connolly S, Allen HK (2012) Estimating population diversity with CatchAll. Bioinformatics 28:1045–1047. https://doi.org/10.1093/bioinformatics/bts075
Bunge J, Willis A, Walsh F (2014) Estimating the number of species in microbial diversity studies. Annu Rev Stat Appl 1:427–445. https://doi.org/10.1146/annurev-statistics-022513-115654
Burnham KP, Overton WS (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology 60:927–936
Chao A, Bunge J (2002) Estimating the number of species in a stochastic abundance model. Biometrics 58:531–539
Chao A, Lee S-M (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc 87:210–217
Chase J, Fouquier J, Zare M, Sonderegger DL, Knight R, Scott TK, Siegel J, Caporaso JG (2016) Geography and location are the primary drivers of office microbiome composition. mSystems. https://doi.org/10.1128/mSystems.00022-16
Choi K, Bulgern WG (1968) An estimation procedure for mixtures of distributions. J R Stat Soc. 30:444–460
Efron B, Thisted R (1976) Estimating the number of unseen species: how many words did Shakespeare know? Biometrika 63:435–447
Fisher RA, Corbet SA, Williams CB (1943) The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol 12:42–58
Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40(3):237–264
Good IJ, Toulmin GH (1956) The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43(1):45–63
Norris JL, Pollock KH (1998) Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environ Ecol Stat 5:391–402
Rocchetti I, Bunge J, Böhning D (2011) Population size estimation based upon ratios of recapture probabilities. Ann Appl Stat 5:1512–1533
Shestopaloff K (2017) Analysis of ecological communities using mixture models. University of Toronto, Toronto
Shestopaloff K, Escobar MD, Xu W (2018) Analyzing differences between microbiome communities using mixture distributions. Stat Med 37:4036–4053
Wagner MR, Lundberg DS, del Rio TG, Tringe SG, Dangl JL, Mitchell-Olds T (2016) Host genotype and age shape the leaf and root microbiomes of wild perennial plant. Nat Commun. https://doi.org/10.1038/ncomms12151
Wang J-P (2010) Estimating species richness by a Poisson-compound gamma model. Biometrika 97:727–740
Wang J-P (2011) SPECIES: an R package for species richness estimation. J Stat Softw. 40(9):1–15
Wang J-P, Lindsay BG (2005) A penalized nonparametric maximum likelihood approach to species richness estimation. J Am Stat Assoc 100:942–959
Willis A, Bunge J (2015) Estimating diversity via frequency ratios. Biometrics 71:1042–1049
Yule GU (1925) II.—A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FR S. Philos Trans R Soc Lond Ser B 213(402):21–87
Acknowledgements
We would like to thank the Associate Editor and anonymous reviewer for their comments and feedback, which improved the substance and presentation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Pierre Dutilleul.
Appendix A: simulation scenarios
Appendix A: simulation scenarios
See Table 4.
Rights and permissions
About this article
Cite this article
Shestopaloff, K., Xu, W. & Escobar, M.D. Estimating total species using a weighted combination of expected mixture distribution component counts. Environ Ecol Stat 27, 447–465 (2020). https://doi.org/10.1007/s10651-020-00452-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-020-00452-6