Abstract
A presence–absence map consists of indicators of the occurrence or nonoccurrence of a given species in each cell over a grid, without counting the number of individuals in a cell once it is known it is occupied. They are commonly used to estimate the distribution of a species, but our interest is in using these data to estimate the abundance of the species. In practice, certain types of species (in particular flora types) may be spatially clustered. For example, some plant communities will naturally group together according to similar environmental characteristics within a given area. To estimate abundance, we develop an approach based on clustered negative binomial models with unknown cluster sizes. Our approach uses working clusters of cells to construct an estimator which we show is consistent. We also introduce a new concept called super-clustering used to estimate components of the standard errors and interval estimators. A simulation study is conducted to examine the performance of the estimators and they are applied to real data.
Similar content being viewed by others
References
Arrhenius O (1921) Species and area. J Ecol 9:95–99
Beissinger SR, Iknayan KJ, Guillera-Arroita G, Zipkin EF, Dorazio RM, Royle JA, Kéry M (2016) Incorporating imperfect detection into joint models of communities: a response to Warton et al. Trends Ecol Evol 31:736–737
Chen G, Kéry M, Plattner M, Ma K, Gardner B (2013) Imperfect detection is the rule rather than the exception in plant distribution studies. J Ecol 101:183–191
Condit R (1998) Tropical forest census plots. Springer-Verlag and R. G. Landes Company, Berlin
Conlisk E, Conlisk J, Enquist B, Thompson J, Harte J (2009) Improved abundance prediction from presence-absence data. Glob Ecol Biogeogr 18:1–10
Dunstan PK, Foster SD, Darnell R (2011) Model based grouping of species across environmental gradients. Ecol Model 222:955–963
Guillera-Arroita G (2017) Modelling of species distributions, range dynamics and communities under imperfect detection: advances, challenges and opportunities. Ecography 40:281–295
Guillera-Arroita G, Lahoz-Monfort JJ, MacKenzie DI, Wintle BA, McCarthy MA (2014) Ignoring imperfect detection in biological surveys is dangerous: a response to “Fitting and interpreting occupancy models”. PLoS ONE 9:e99571
He F, Gaston KJ (2000) Estimating species abundance from occurrence. Am Nat 156:553–559
He F, Gaston KJ (2007) Estimating abundance from occurrence: an underdetermined problem. Am Nat 170:655–659
He F, Reed W (2006) Downscaling abundance from the distribution of species: occupancy theory and applications. In: Wu J, Jones KB, Li H, Loucks OL (eds) Scaling and uncertainty analysis in ecology: methods and applications. Springer, Dordrecht, pp 89–108
Hubbell SP, Condit R, Foster RB (2005) Barro Colorado Forest Census Plot Data. http://ctfs.si.edu/webatlas/datasets/bci
Hubbell SP, Foster RB, O’Brien ST, Harms KE, Condit R, Wechsler B, Wright SJ, Loo de Lao S (1999) Light gap disturbances, recruitment limitation, and tree diversity in a neotropical forest. Science 283:554–557
Hwang WH, He F (2011) Estimating abundance from presence/absence maps. Methods Ecol Evol 2:550–559
Hwang WH, Huggins RM (2016) Estimating abundance from presence-absence maps via a paired negative binomial model. Scand J Stat 43:573–586
Hwang WH, Huggins RM, Chen LF (2017) A note on the inverse birthday problem with applications. Am Stat 71:191–201
Kunin WE (1998) Extrapolating species abundance across spatial scales. Science 281:1513–1515
Kunin WE, Hartley S, Lennon JJ (2000) Scaling down: on the challenge of estimating abundance from occurrence patterns. Am Nat 156:560–566
MacKenzie DI, Nichols JD, Royle JA, Pollock KH, Bailey LL, Hines JE (2006) Occupancy estimation and modeling: inferring patterns and dynamics of species occurrence. Elsevier/Academic Press, Burlington
Manly BFJ, Navarro Alberto JA (2015) Introduction to ecological sampling. Chapman & Hall/CRC, London
Muller CH, Huggins RM, Hwang WH (2011) Consistent estimation of species abundance from a presence-absence map. Stat Probab Lett 81:1449–1457
Nadarajah S, Kotz S (2008) Exact distribution of the max/min of two Gaussian random variables. IEEE Trans Very Large Scale Integr (VLSI) Syst 16:210–212
Novotny V, Miller S, Hulcr J, Drew R, Basset Y, Janda M, Setliff G, Darrow K, Stewart A, Auga J, Isua B, Molem K, Manumbor M, Tamtiai E, Mogia M, Weiblen G (2007) Low beta diversity of herbivorous insects in tropical forests. Nature 448:692–695
Ross L, Woodin S, Hester A, Thompson D, Birks H (2012) Biotic homogenization of upland vegetation: Patterns and drivers at multiple spatial scales over five decades. J Veg Sci 23:755–770
Royle AR, Dorazio RM (2008) Hierarchical modeling and inference in ecology: the analysis of data from populations, metapopulations and communities. Academic Press, San Diego
Royle JA, Nichols JD (2003) Estimating abundance from repeated presence-absence data or point counts. Ecology 84:777–790
Shen G, He F, Waagepetersen R, Sun IF, Hao Z, Chen ZS, Yu M (2013) Quantifying effects of habitat heterogeneity and other clustering processes on spatial distributions of tree species. Ecology 94:2436–2443
Sherman M (2011) Spatial statistics and spatio-temporal data: covariance functions and directional properties. Wiley, New York
Solow AR, Smith WK (2010) On predicting abundance from occupancy. Am Nat 176:96–98
Welsh AH, Lindenmayer DB, Donnelly CF (2013) Fitting and interpreting occupancy models. PLoS ONE 8:e52015
Williams BK, Nichols JD, Conroy MJ (2002) Analysis and management of animal populations. Academic Press, San Diego
Yoccoz NG, Nichols JD, Boulinier T (2001) Monitoring of biological diversity in space and time. Trends Ecol Evol 16:446–453
Yin D, He F (2014) A simple method for estimating species abundance from occurrence maps. Methods Ecol Evol 5:336–343
Acknowledgements
We are grateful to the Associate Editor and a referee for providing helpful comments and constructive suggestions, especially for indicating the use of jackknife standard error. The BCI forest dynamics research project was founded by S.P. Hubbell and R.B. Foster and is now managed by R. Condit, S. Lao, and R. Perez under the Center for Tropical Forest Science and the Smithsonian Tropical Research in Panama. Numerous organizations have provided funding, principally the U.S. National Science Foundation, and hundreds of field workers have contributed to this project. This work was supported by the Ministry of Science & Technology of Taiwan.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor Pierre Dutilleul.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix A: Elementary calculations
1.1 A.1 One Cluster
We establish some probabilities associated with a single cluster. Consider a single cell. Let \(\lambda \sim \mathrm{Gamma}(\kappa ,\beta )\), which has density
where \(\lambda >0\) and \(\Gamma (\cdot )\) is the usual gamma function. We suppose that given \(\lambda \), the number of a species in a grid cell X satisfies \(X \sim \mathrm{Poisson}(\lambda )\) so that if \(Z=I(X=0)\) we have \(p(\lambda )=P(Z=1\mid \lambda )=P(X=0 \mid \lambda )=\exp (-\lambda )\).
Consider a cluster of c random variables \(X_1,\ldots ,X_{c}\) that have the same value of \(\lambda \), and given \(\lambda \), \(X_1,\ldots ,X_c\) are independent \(\mathrm{Poisson}(\lambda )\) distributed. Hence, \(X_1,\ldots , X_{c}\) are exchangeable. Then, with \(s=x_1+\cdots +x_{c}\), it yields the following joint distribution
Clearly the marginal distributions of the \(X_\ell \) are negative binomial:
so that \(E(X_\ell )=\beta \kappa \), \(\mathrm{Var}(X_\ell )=\kappa \beta (1+\beta )\) and \(\mathrm{Cov}(X_1,X_2)=\kappa \beta ^2\). A direct calculation leads to \(\mathrm{Var}\left( \sum _{\ell =1}^{c} X_\ell \right) =\kappa (\beta ^2c^2+\beta c)\).
Now, let \(F_k\) be the number of nonempty cells in the kth working cluster, \(k=1,\ldots ,M/c'\), let \(\theta =(\beta ,\kappa )\), and let \(p_{c'}(\ell ;\theta )= P(Z_{k}=\ell )\), \(\ell =0,1,\ldots ,c'\). Then, with \(\theta =(\beta ,\kappa )^\intercal \),
1.2 A.2 Multiple Clusters
Lemma 1
Under the c-cluster model,
Proof
First, from (11) we have
which yields (12). The law of large numbers yields
and (13) follows. \(\square \)
Lemma 2
Proof
First, \(T=M/c\), \(K=c/c'\) and
so that \(T{\partial g_t(\theta )}/{\partial \theta ^\intercal }=M D(\theta )\). \(\square \)
Lemma 3
\(\mathrm{Cov}(g_t(\theta ))=K \Sigma (\theta ,K)\).
Proof
Elementary calculations show that for \(j \ne \ell \), we have
and
Also, note that
and
\(\square \)
Note that \(\Sigma (\theta ,K)\) depends on the true cluster size c through K.
Lemma 4
Let \(\theta ^0\) denote the true value of \(\theta \). Then
and hence,
Proof
First, note that if the pair of random variables (W, X) are independent of the pair (Y, Z) and w.l.o.g. all have zero means then
so that \((W+X)^2+(Y+Z)^2\) and \((W+X+Y+Z)^2\) have the same mean. Now, using (6), \(g_t^*(\theta )\), \(t=1,\ldots ,T^*\) are independently and identically distributed with zero means and arguing as in Lemma 3, have the common covariance matrix \(s^*K\Sigma (\theta ^0,K)=K^*\Sigma (\theta ^0,K)\). Hence
and noting that \(T^*K^*=M/c'\) yields (14). Next,
and (15) follows from the Toeplitz lemma. \(\square \)
Lemma 5
Recall \(X_t=\sum _{j=1}^K\sum _{k=1}^{c'} X_{tjk}\). Then \(\mathrm{Cov}\{g_t(\theta ), X_t\}=K C(\theta ,K)\) where \(C(\theta ,K)\) is given by (9), i.e.,
Proof
First, \(Z_{tj} X_{tj}=0\) and as \(E(X_{tj})=c'\kappa \beta \), it is easily seen that
Next,
Now, as \(X_{t\ell } \mid \lambda _t \sim \mathrm{Poisson}(c' \lambda _t)\) and for \(\ell \ne j\) given \(\lambda _t\), \(X_{t\ell }\) and \(Z_{tj}\) are independent, then for \(\ell \ne j\) and \(s>0\),
so that
and hence
Thus, the first term in \(C(\theta ,K)\) is
Similarly,
\(\square \)
Appendix B: Proofs of Theorems 2, 3, and 5
1.1 B.1 Proof of Theorem 2
Now \(g(\theta )\) is the sum of the i.i.d. vectors \(g_t(\theta )\), \(t=1,\ldots ,T\). These have zero means and, from Lemma 3, the covariance matrix \(K\Sigma (\theta ^0)\). The central limit theorem for independent random vectors yields \(T^{-1/2}g(\theta ^0) \buildrel D \over \longrightarrow N(0,K\Sigma (\theta ,K))\). Also, \({\partial g(\theta )}/{\partial \theta ^\intercal }=MD(\theta )\) and \(\tilde{\theta }_{c'} -\theta \approx -M^{-1} D(\theta ^0) ^{-1}g(\theta ^0)\) so that
As a consequence, this implies that
where \(D(\theta ^0)^{-\intercal }=\{D(\theta ^0) ^{-1}\}^\intercal \).
1.2 B.2 Proof of Theorem 3
Note that under the c-cluster model, the \(X_t(\theta )\) are independent. Then,
As a consequence, we have
1.3 B.3 Proof of Theorem 5
Without loss of generality, suppose that \(c=n_1\times 1\) and \(c'=n_2 \times 1\). The proof is complete by considering four cases as follows:
-
1.
\(n_1=\phi n_2\), where \(\phi \in Z^+\).
This is the so-called proper case and the estimator is asymptotically unbiased.
-
2.
\(n_1=\phi n_2+j\), where \(\phi , j\in Z^+\).
Let \(\ell \) be the least common multiple of \(n_1\) and \(n_2\) and let \(\ell /n_1=m_1\) and \(\ell /n_2=m_2\). Consider the case where each of the \(\ell \times 1\) cells consists of \(m_1\) independent \(n_1\)-clusters. In contrast, the \(\ell \times 1\) cells are also divided into \(m_2's\) \(n_2\)-clusters, however, some of these are dependent.
In regards to the \(m_2\) clusters, we find some of them are included in a single \(n_1\)-cluster but some are from two \(n_1\)-cluster. Specifically, there may be \([m_2/2]+1\) types of the working clusters, denoted by Type\(_s\), where \(s=0,1,\ldots ,S\) and \(S=[m_2/2]\). A working cluster belongs to the Type\(_s\) if it has s cells from another \(n_1\)-cluster. (Type\(_0\) represents that all \(n_2\) cells are from a single \(n_1\)-cluster).
It is then easy to see that the empty (absence) probability of the Type\(_s\) cluster is \(\{(1+s\beta )^\kappa (1+(n_2- s)\beta )^\kappa \}^{-1}\). Let \(t_s\) be the frequency of Type\(_s\), so that \(\sum _{s=0}^S t_s= m_2\). Consequently,
$$\begin{aligned} E\left( \frac{n_2f_0^{n_2}}{M}\right) = \sum _{s=0}^S \frac{t_s}{m_2}\frac{1}{(1+s\beta )^\kappa \left\{ 1+(n_2- s)\beta \right\} ^\kappa } <\frac{1}{(1+n_2\beta )^{\kappa }} \end{aligned}$$as \(t_0< m_2\).
-
3.
\(n_2=\phi n_1\), where \(\phi \in Z^+\).
\(E(n_2f_0^{n_2}/M )=\frac{1}{(1+n_1\beta )^{\kappa \phi }}< \frac{1}{(1+n_2\beta )^{\kappa }}\) as \((1+n_1 \beta )^\phi > 1+\phi n_1 \beta = 1+n_2 \beta \).
-
4.
\(n_2=\phi n_1+j\), where \(\phi , j \in Z^+\).
Similar to the case 2 and 3.
Rights and permissions
About this article
Cite this article
Huggins, R., Hwang, WH. & Stoklosa, J. Estimation of abundance from presence–absence maps using cluster models. Environ Ecol Stat 25, 495–522 (2018). https://doi.org/10.1007/s10651-018-0415-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-018-0415-5