Abstract
Citizen science databases are increasing in importance as sources of ecological information, but variability in effort across locations is inherent to such data. Spatially biased data—data not sampled uniformly across the study region—is expected. A further introduction of bias is variability in the level of sampling activity across locations. This motivates our work: with a spatial dataset of visited locations and sampling activity at those locations, we propose a model-based approach for assessing effort at these locations. Adjusting for potential spatial bias both in terms of sites visited and in terms of effort is crucial for developing reliable species distribution models (SDMs). Using data from eBird, a global citizen science database dedicated to avifauna, and illustrative regions in Pennsylvania and Germany, we model spatial dependence in both the observation locations and observed activity. We employ point process models to explain the observed locations in space, fit a geostatistical model to explain observation effort at locations, and explore the potential existence of preferential sampling, i.e., dependence between the two processes. Altogether, we offer a richer notion of sampling effort, combining information about location and activity. As SDMs are often used for their predictive capabilities, an important advantage of our approach is the ability to predict effort at unobserved locations and over regions. In this way, we can accommodate misalignment between point-referenced data and say, desired areal scale density. We briefly illustrate how our proposed methods can be applied to SDMs, with demonstrated improvement in prediction from models incorporating effort.
Similar content being viewed by others
References
Agarwal DK, Gelfand AE, Silander JA (2002) Investigating tropical deforestation using two-stage spatially misaligned regression models. J Agric Biol Environ Stat 7(3):420
Banerjee S, Carlin BP, Gelfand AE (2014) Hierarchical modeling and analysis for spatial data. CRC Press, Boca Raton
Beck J, Böller M, Erhardt A, Schwanghart W (2014) Spatial bias in the gbif database and its effect on modeling species’ geographic distributions. Ecol Inform 19:10–15
Bird TJ, Bates AE, Lefcheck JS, Hill NA, Thomson RJ, Edgar GJ, Stuart-Smith RD, Wotherspoon S, Krkosek M, Stuart-Smith JF et al (2014) Statistical solutions for error and bias in global citizen science datasets. Biol Conserv 173:144–154
Boakes EH, McGowan PJ, Fuller RA, Chang-qing D, Clark NE, O’Connor K, Mace GM (2010) Distorted views of biodiversity: spatial and temporal bias in species occurrence data. PLoS Biol 8(6)
Booth JE, Gaston KJ, Evans KL, Armsworth PR (2011) The value of species rarity in biodiversity recreation: a birdwatching example. Biol Conserv 144(11):2728–2732
Boria RA, Olson LE, Goodman SM, Anderson RP (2014) Spatial filtering to reduce sampling bias can improve the performance of ecological niche models. Ecol Model 275:73–77
Brunsdon C, Comber L (2012) Assessing the changing flowering date of the common lilac in North America: a random coefficient model approach. Geoinformatica 16(4):675–690
Callaghan C, Lyons M, Martin J, Major R, Kingsford R (2017) Assessing the reliability of avian biodiversity measures of urban greenspaces using eBird citizen science data. Avian Conserv Ecol 12(2)
Comber A, See L, Fritz S, Van der Velde M, Perger C, Foody G (2013) Using control data to determine the reliability of volunteered geographic information about land cover. Int J Appl Earth Obs Geoinf 23:37–48
Conn PB, Thorson JT, Johnson DS (2017) Confronting preferential sampling when analysing population distributions: diagnosis and model-based triage. Methods Ecol Evol 8(11):1535–1546
Courter JR, Johnson RJ, Stuyck CM, Lang BA, Kaiser EW (2013) Weekend bias in citizen science data reporting: implications for phenology studies. Int J Biometeorol 57(5):715–720
Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812
Dennis RLH, Thomas C (2000) Bias in butterfly distribution maps: the influence of hot spots and recorder’s home range. J Insect Conserv 4(2):73–77
Dennis RL, Sparks TH, Hardy PB (1999) Bias in butterfly distribution maps: the effects of sampling effort. J Insect Conserv 3(1):33–42
Dickinson JL, Zuckerberg B, Bonter DN (2010) Citizen science as an ecological research tool: challenges and benefits. Annu Rev Ecol Evol Syst 41:147–172
Diggle PJ, Menezes R, Su T-L (2010) Geostatistical inference under preferential sampling. J R Stat Soc: Ser C (Appl Stat) 59(2):191–232
eBird (2017) eBird: an online database of bird distribution and abundance. Cornell Lab of Ornithology, Ithaca, New York
Fink D, Auer T, Johnston A, Ruiz-Gutierrez V, Hochachka WM, Kelling S (2020) Modeling avian full annual cycle distribution and population trends with citizen science data. Ecol Appl 30(3):e02056
Geldmann J, Heilmann-Clausen J, Holm TE, Levinsky I, Markussen B, Olsen K, Rahbek C, Tøttrup AP (2016) What determines spatial bias in citizen science? exploring four recording schemes with different proficiency requirements. Divers Distrib 22(11):1139–1149
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378
Griffith EH, Sauer JR, Royle JA (2010) Traffic effects on bird counts on North American breeding bird survey routes. The Auk 127(2):387–393
Harris J, Haskell D (2007) Land cover sampling biases associated with roadside bird surveys. Avian Conserv Ecolo 2(2)
Hill MO (2012) Local frequency as a key to interpreting species occurrence data when recording effort is not known. Methods Ecol Evol 3(1):195–205
Illian J, Penttinen A, Stoyan H, Stoyan D (2008) Statistical analysis and modelling of spatial point patterns, vol 70. Wiley, New York
Isaac NJB, Pocock MJO (2015) Bias and information in biological records. Biol J Linn Soc 115(3):522–531
Isaac NJ, van Strien AJ, August TA, de Zeeuw MP, Roy DB (2014) Statistics for citizen science: extracting signals of change from noisy ecological data. Methods Ecol Evol 5(10):1052–1060
Jeppsson T, Lindhe A, Gärdenfors U, Forslund P (2010) The use of historical collections to estimate population trends: a case study using Swedish longhorn beetles (Coleoptera: Cerambycidae). Biol Conserv 143(9):1940–1950
Johnston A, Moran N, Musgrove A, Fink D, Baillie SR (2020) Estimating species distributions from spatially biased citizen science data. Ecol Model 422:108927
Kearney MR, Wintle BA, Porter WP (2010) Correlative and mechanistic models of species distribution provide congruent forecasts under climate change. Conserv Lett 3(3):201–213
Kelling S, Johnston A, Bonn A, Fink D, Ruiz-Gutierrez V, Bonney R, Fernandez M, Hochachka WM, Julliard R, Kraemer R et al (2019) Using semistructured surveys to improve citizen science data for monitoring biodiversity. BioScience 69(3):170–179
Kery M, Royle JA, Schmid H, Schaub M, Volet B, Haefliger G, Zbinden N (2010) Site-occupancy distribution modeling to correct population-trend estimates derived from opportunistic observations. Conserv Biol 24(5):1388–1397
Langham GM, Schuetz JG, Distler T, Soykan CU, Wilsey C (2015) Conservation status of North American birds in the face of future climate change. PLoS ONE 10(9):e0135350
Lee D, Sarran C (2015) Controlling for unmeasured confounding and spatial misalignment in long-term air pollution and health studies. Environmetrics 26(7):477–487
Leininger TJ, Gelfand AE et al (2017) Bayesian inference and model assessment for spatial point patterns using posterior predictive samples. Bayesian Anal 12(1):1–30
Lobo JM, Baselga A, Hortal J, Jiménez-Valverde A, Gómez JF (2007) How does the knowledge about the spatial distribution of Iberian dung beetle species accumulate over time? Divers Distrib 13(6):772–780
MacKenzie DI, Nichols JD, Royle JA, Pollock KH, Bailey L, Hines JE (2017) Occupancy estimation and modeling: inferring patterns and dynamics of species occurrence. Elsevier, Amsterdam
Mair L, Ruete A (2016) Explaining spatial variation in the recording effort of citizen science data across multiple taxa. PLoS ONE 11(1):e0147796
Møller J, Syversveen AR, Waagepetersen RP (1998) Log Gaussian cox processes. Scand J Stat 25(3):451–482
Murray I, Prescott Adams R, MacKay DJ (2010) Elliptical slice sampling
Niemuth ND, Dahl AL, Estey ME, Loesch CR (2007) Representation of landcover along breeding bird survey routes in the northern plains. J Wildl Manag 71(7):2258–2265
Oliveira U, Brescovit AD, Santos AJ (2017) Sampling effort and species richness assessment: a case study on Brazilian spiders. Biodivers Conserv 26(6):1481–1493
Pati D, Reich BJ, Dunson DB (2011) Bayesian geostatistical modelling with informative sampling locations. Biometrika 98(1):35–48
R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Robinson OJ, Ruiz-Gutierrez V, Fink D (2018) Correcting for bias in distribution modelling for rare species using citizen science data. Divers Distrib 24(4):460–472
Rosenberg KV, Dokter AM, Blancher PJ, Sauer JR, Smith AC, Smith PA, Stanton JC, Panjabi A, Helft L, Parr M et al (2019) Decline of the North American avifauna. Science 366(6461):120–124
Roy HE, Adriaens T, Isaac NJ, Kenis M, Onkelinx T, Martin GS, Brown PM, Hautier L, Poland R, Roy DB et al (2012) Invasive alien predator causes rapid declines of native European ladybirds. Divers Distrib 18(7):717–725
Ruete A (2015) Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance map. Biodivers Data J 3
Sicacha-Parada J, Steinsland I, Cretois B, Borgelt J (2020) Accounting for spatial varying sampling effort due to accessibility in citizen science data: a case study of moose in Norway. Spat Stat 42
Sullivan BL, Aycrigg JL, Barry JH, Bonney RE, Bruns N, Cooper CB, Damoulas T, Dhondt AA, Dietterich T, Farnsworth A et al (2014) The ebird enterprise: an integrated approach to development and application of citizen science. Biol Conserv 169:31–40
Szabo JK, Vesk PA, Baxter PW, Possingham HP (2010) Regional avian species declines estimated from volunteer-collected long-term data using list length analysis. Ecol Appl 20(8):2157–2169
Szabo JK, Vesk PA, Baxter PW, Possingham HP (2011) Paying the extinction debt: woodland birds in the mount lofty ranges, South Australia. Emu-Austral Ornithol 111(1):59–70
Tiago P, Ceia-Hasse A, Marques TA, Capinha C, Pereira HM (2017) Spatial distribution of citizen science casuistic observations for different taxonomic groups. Sci Rep 7(1):1–9
van Strien AJ, van Swaay CA, Termaat T (2013) Opportunistic citizen science data of animal species produce reliable estimates of distribution trends if analysed with occupancy models. J Appl Ecol 50(6):1450–1458
Zaniewski AE, Lehmann A, Overton JM (2002) Predicting species spatial distributions using presence-only data: a case study of native New Zealand ferns. Ecol Model 157(2–3):261–280
Zhang H (2004) Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J Am Stat Assoc 99(465):250–261
Acknowledgements
For comments on the manuscript we thank Valentin Journe, Ruben Palacio, Renata Poulton Kamakura, Tong Qiu, Chantal Reid, C. Lane Scher, Shubhi Sharma, and Maggie Swift.
Funding
The project was funded by the National Science Foundation (NSF-DEB-1754443, NSF ICER/Belmont Forum Biodiversa), NASA (AIST18-0063), and the Programme d’Investissement d’Avenir under project FORBIC (18-MPGA-0004).
Author information
Authors and Affiliations
Contributions
J.S.C. conceived of the study, and A.E.G. and J.S.C. contributed to formulation of models. B.T. obtained the data and implemented the analyses. A.E.G. and B.T. drafted the paper with contributions from J.S.C. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflicts of interest
All authors declare that they have no conflict of interest.
Data availability
All data used in this study are available at https://github.com/beckytang/ebird_data.
Code availability
All data used in this study are available at https://github.com/beckytang/ebird_data.
Additional information
Handling Editor:Pierre R. L. Dutilleul
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Tang, B., Clark, J.S. & Gelfand, A.E. Modeling spatially biased citizen science effort through the eBird database. Environ Ecol Stat 28, 609–630 (2021). https://doi.org/10.1007/s10651-021-00508-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10651-021-00508-1