Skip to main content
Log in

Estimating total species using a weighted combination of expected mixture distribution component counts

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

In this paper we present a weighted mixture distribution component counts (MDCC) approach for estimating total number of species. The proposed method combines conditional estimates of component counts from several candidate mixture distributions and uses bootstrap for confidence interval estimation. The distribution specification is flexible and can be adjusted to suit a variety of datasets. Smoothing techniques can also be incorporated to improve modeling of sparse data. The method is tested by a simulation study and applied to two microbiome datasets for illustration. Simulation results indicate improved bias, mean squared error and confidence interval coverage relative to comparison methods, as well as robustness to underlying data structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

Office dust microbiome data is available from Qiita (http://qiita.microbio.me, Study ID: 10423). Plant microbiome data is available from Dryad (https://datadryad.org/stash/dataset/doi:10.5061/dryad.g60r3).

References

  • Bunge J, Barger K (2008) Parametric models for estimating the number of classes. Biom J 50:971–982

    Article  Google Scholar 

  • Bunge J, Fitzpatrick M (1993) Estimating the number of species: a review. J Am Stat Assoc 88:364–373

    Google Scholar 

  • Bunge J, Woodard L, Böhning D, Foster JA, Connolly S, Allen HK (2012) Estimating population diversity with CatchAll. Bioinformatics 28:1045–1047. https://doi.org/10.1093/bioinformatics/bts075

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bunge J, Willis A, Walsh F (2014) Estimating the number of species in microbial diversity studies. Annu Rev Stat Appl 1:427–445. https://doi.org/10.1146/annurev-statistics-022513-115654

    Article  Google Scholar 

  • Burnham KP, Overton WS (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology 60:927–936

    Article  Google Scholar 

  • Chao A, Bunge J (2002) Estimating the number of species in a stochastic abundance model. Biometrics 58:531–539

    Article  Google Scholar 

  • Chao A, Lee S-M (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc 87:210–217

    Article  Google Scholar 

  • Chase J, Fouquier J, Zare M, Sonderegger DL, Knight R, Scott TK, Siegel J, Caporaso JG (2016) Geography and location are the primary drivers of office microbiome composition. mSystems. https://doi.org/10.1128/mSystems.00022-16

    Article  PubMed  PubMed Central  Google Scholar 

  • Choi K, Bulgern WG (1968) An estimation procedure for mixtures of distributions. J R Stat Soc. 30:444–460

    Google Scholar 

  • Efron B, Thisted R (1976) Estimating the number of unseen species: how many words did Shakespeare know? Biometrika 63:435–447

    Google Scholar 

  • Fisher RA, Corbet SA, Williams CB (1943) The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol 12:42–58

    Article  Google Scholar 

  • Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40(3):237–264

    Article  Google Scholar 

  • Good IJ, Toulmin GH (1956) The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43(1):45–63

    Article  Google Scholar 

  • Norris JL, Pollock KH (1998) Non-parametric MLE for Poisson species abundance models allowing for heterogeneity between species. Environ Ecol Stat 5:391–402

    Article  Google Scholar 

  • Rocchetti I, Bunge J, Böhning D (2011) Population size estimation based upon ratios of recapture probabilities. Ann Appl Stat 5:1512–1533

    Article  Google Scholar 

  • Shestopaloff K (2017) Analysis of ecological communities using mixture models. University of Toronto, Toronto

    Google Scholar 

  • Shestopaloff K, Escobar MD, Xu W (2018) Analyzing differences between microbiome communities using mixture distributions. Stat Med 37:4036–4053

    Article  Google Scholar 

  • Wagner MR, Lundberg DS, del Rio TG, Tringe SG, Dangl JL, Mitchell-Olds T (2016) Host genotype and age shape the leaf and root microbiomes of wild perennial plant. Nat Commun. https://doi.org/10.1038/ncomms12151

    Article  PubMed  PubMed Central  Google Scholar 

  • Wang J-P (2010) Estimating species richness by a Poisson-compound gamma model. Biometrika 97:727–740

    Article  Google Scholar 

  • Wang J-P (2011) SPECIES: an R package for species richness estimation. J Stat Softw. 40(9):1–15

    Article  Google Scholar 

  • Wang J-P, Lindsay BG (2005) A penalized nonparametric maximum likelihood approach to species richness estimation. J Am Stat Assoc 100:942–959

    Article  CAS  Google Scholar 

  • Willis A, Bunge J (2015) Estimating diversity via frequency ratios. Biometrics 71:1042–1049

    Article  Google Scholar 

  • Yule GU (1925) II.—A mathematical theory of evolution, based on the conclusions of Dr. JC Willis, FR S. Philos Trans R Soc Lond Ser B 213(402):21–87

    Google Scholar 

Download references

Acknowledgements

We would like to thank the Associate Editor and anonymous reviewer for their comments and feedback, which improved the substance and presentation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konstantin Shestopaloff.

Additional information

Handling Editor: Pierre Dutilleul.

Appendix A: simulation scenarios

Appendix A: simulation scenarios

See Table 4.

Table 4 List of component weights for each of the simulation scenarios, ordered by the expected proportion of zeros or unobserved species

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shestopaloff, K., Xu, W. & Escobar, M.D. Estimating total species using a weighted combination of expected mixture distribution component counts. Environ Ecol Stat 27, 447–465 (2020). https://doi.org/10.1007/s10651-020-00452-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-020-00452-6

Keywords

Navigation