Skip to main content
Log in

Modeling spatially biased citizen science effort through the eBird database

  • Published:
Environmental and Ecological Statistics Aims and scope Submit manuscript

Abstract

Citizen science databases are increasing in importance as sources of ecological information, but variability in effort across locations is inherent to such data. Spatially biased data—data not sampled uniformly across the study region—is expected. A further introduction of bias is variability in the level of sampling activity across locations. This motivates our work: with a spatial dataset of visited locations and sampling activity at those locations, we propose a model-based approach for assessing effort at these locations. Adjusting for potential spatial bias both in terms of sites visited and in terms of effort is crucial for developing reliable species distribution models (SDMs). Using data from eBird, a global citizen science database dedicated to avifauna, and illustrative regions in Pennsylvania and Germany, we model spatial dependence in both the observation locations and observed activity. We employ point process models to explain the observed locations in space, fit a geostatistical model to explain observation effort at locations, and explore the potential existence of preferential sampling, i.e., dependence between the two processes. Altogether, we offer a richer notion of sampling effort, combining information about location and activity. As SDMs are often used for their predictive capabilities, an important advantage of our approach is the ability to predict effort at unobserved locations and over regions. In this way, we can accommodate misalignment between point-referenced data and say, desired areal scale density. We briefly illustrate how our proposed methods can be applied to SDMs, with demonstrated improvement in prediction from models incorporating effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Agarwal DK, Gelfand AE, Silander JA (2002) Investigating tropical deforestation using two-stage spatially misaligned regression models. J Agric Biol Environ Stat 7(3):420

    Article  Google Scholar 

  • Banerjee S, Carlin BP, Gelfand AE (2014) Hierarchical modeling and analysis for spatial data. CRC Press, Boca Raton

    Book  Google Scholar 

  • Beck J, Böller M, Erhardt A, Schwanghart W (2014) Spatial bias in the gbif database and its effect on modeling species’ geographic distributions. Ecol Inform 19:10–15

    Article  Google Scholar 

  • Bird TJ, Bates AE, Lefcheck JS, Hill NA, Thomson RJ, Edgar GJ, Stuart-Smith RD, Wotherspoon S, Krkosek M, Stuart-Smith JF et al (2014) Statistical solutions for error and bias in global citizen science datasets. Biol Conserv 173:144–154

    Article  Google Scholar 

  • Boakes EH, McGowan PJ, Fuller RA, Chang-qing D, Clark NE, O’Connor K, Mace GM (2010) Distorted views of biodiversity: spatial and temporal bias in species occurrence data. PLoS Biol 8(6)

  • Booth JE, Gaston KJ, Evans KL, Armsworth PR (2011) The value of species rarity in biodiversity recreation: a birdwatching example. Biol Conserv 144(11):2728–2732

    Article  Google Scholar 

  • Boria RA, Olson LE, Goodman SM, Anderson RP (2014) Spatial filtering to reduce sampling bias can improve the performance of ecological niche models. Ecol Model 275:73–77

    Article  Google Scholar 

  • Brunsdon C, Comber L (2012) Assessing the changing flowering date of the common lilac in North America: a random coefficient model approach. Geoinformatica 16(4):675–690

    Article  Google Scholar 

  • Callaghan C, Lyons M, Martin J, Major R, Kingsford R (2017) Assessing the reliability of avian biodiversity measures of urban greenspaces using eBird citizen science data. Avian Conserv Ecol 12(2)

  • Comber A, See L, Fritz S, Van der Velde M, Perger C, Foody G (2013) Using control data to determine the reliability of volunteered geographic information about land cover. Int J Appl Earth Obs Geoinf 23:37–48

    Google Scholar 

  • Conn PB, Thorson JT, Johnson DS (2017) Confronting preferential sampling when analysing population distributions: diagnosis and model-based triage. Methods Ecol Evol 8(11):1535–1546

    Article  Google Scholar 

  • Courter JR, Johnson RJ, Stuyck CM, Lang BA, Kaiser EW (2013) Weekend bias in citizen science data reporting: implications for phenology studies. Int J Biometeorol 57(5):715–720

    Article  PubMed  Google Scholar 

  • Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Dennis RLH, Thomas C (2000) Bias in butterfly distribution maps: the influence of hot spots and recorder’s home range. J Insect Conserv 4(2):73–77

    Article  Google Scholar 

  • Dennis RL, Sparks TH, Hardy PB (1999) Bias in butterfly distribution maps: the effects of sampling effort. J Insect Conserv 3(1):33–42

    Article  Google Scholar 

  • Dickinson JL, Zuckerberg B, Bonter DN (2010) Citizen science as an ecological research tool: challenges and benefits. Annu Rev Ecol Evol Syst 41:147–172

    Article  Google Scholar 

  • Diggle PJ, Menezes R, Su T-L (2010) Geostatistical inference under preferential sampling. J R Stat Soc: Ser C (Appl Stat) 59(2):191–232

    Article  Google Scholar 

  • eBird (2017) eBird: an online database of bird distribution and abundance. Cornell Lab of Ornithology, Ithaca, New York

  • Fink D, Auer T, Johnston A, Ruiz-Gutierrez V, Hochachka WM, Kelling S (2020) Modeling avian full annual cycle distribution and population trends with citizen science data. Ecol Appl 30(3):e02056

    Article  PubMed  PubMed Central  Google Scholar 

  • Geldmann J, Heilmann-Clausen J, Holm TE, Levinsky I, Markussen B, Olsen K, Rahbek C, Tøttrup AP (2016) What determines spatial bias in citizen science? exploring four recording schemes with different proficiency requirements. Divers Distrib 22(11):1139–1149

    Article  Google Scholar 

  • Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378

    Article  CAS  Google Scholar 

  • Griffith EH, Sauer JR, Royle JA (2010) Traffic effects on bird counts on North American breeding bird survey routes. The Auk 127(2):387–393

    Article  Google Scholar 

  • Harris J, Haskell D (2007) Land cover sampling biases associated with roadside bird surveys. Avian Conserv Ecolo 2(2)

  • Hill MO (2012) Local frequency as a key to interpreting species occurrence data when recording effort is not known. Methods Ecol Evol 3(1):195–205

    Article  Google Scholar 

  • Illian J, Penttinen A, Stoyan H, Stoyan D (2008) Statistical analysis and modelling of spatial point patterns, vol 70. Wiley, New York

    Google Scholar 

  • Isaac NJB, Pocock MJO (2015) Bias and information in biological records. Biol J Linn Soc 115(3):522–531

    Article  Google Scholar 

  • Isaac NJ, van Strien AJ, August TA, de Zeeuw MP, Roy DB (2014) Statistics for citizen science: extracting signals of change from noisy ecological data. Methods Ecol Evol 5(10):1052–1060

    Article  Google Scholar 

  • Jeppsson T, Lindhe A, Gärdenfors U, Forslund P (2010) The use of historical collections to estimate population trends: a case study using Swedish longhorn beetles (Coleoptera: Cerambycidae). Biol Conserv 143(9):1940–1950

    Article  Google Scholar 

  • Johnston A, Moran N, Musgrove A, Fink D, Baillie SR (2020) Estimating species distributions from spatially biased citizen science data. Ecol Model 422:108927

    Article  Google Scholar 

  • Kearney MR, Wintle BA, Porter WP (2010) Correlative and mechanistic models of species distribution provide congruent forecasts under climate change. Conserv Lett 3(3):201–213

    Article  Google Scholar 

  • Kelling S, Johnston A, Bonn A, Fink D, Ruiz-Gutierrez V, Bonney R, Fernandez M, Hochachka WM, Julliard R, Kraemer R et al (2019) Using semistructured surveys to improve citizen science data for monitoring biodiversity. BioScience 69(3):170–179

    Article  PubMed  PubMed Central  Google Scholar 

  • Kery M, Royle JA, Schmid H, Schaub M, Volet B, Haefliger G, Zbinden N (2010) Site-occupancy distribution modeling to correct population-trend estimates derived from opportunistic observations. Conserv Biol 24(5):1388–1397

    Article  PubMed  Google Scholar 

  • Langham GM, Schuetz JG, Distler T, Soykan CU, Wilsey C (2015) Conservation status of North American birds in the face of future climate change. PLoS ONE 10(9):e0135350

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Lee D, Sarran C (2015) Controlling for unmeasured confounding and spatial misalignment in long-term air pollution and health studies. Environmetrics 26(7):477–487

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Leininger TJ, Gelfand AE et al (2017) Bayesian inference and model assessment for spatial point patterns using posterior predictive samples. Bayesian Anal 12(1):1–30

    Article  Google Scholar 

  • Lobo JM, Baselga A, Hortal J, Jiménez-Valverde A, Gómez JF (2007) How does the knowledge about the spatial distribution of Iberian dung beetle species accumulate over time? Divers Distrib 13(6):772–780

    Article  Google Scholar 

  • MacKenzie DI, Nichols JD, Royle JA, Pollock KH, Bailey L, Hines JE (2017) Occupancy estimation and modeling: inferring patterns and dynamics of species occurrence. Elsevier, Amsterdam

    Google Scholar 

  • Mair L, Ruete A (2016) Explaining spatial variation in the recording effort of citizen science data across multiple taxa. PLoS ONE 11(1):e0147796

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Møller J, Syversveen AR, Waagepetersen RP (1998) Log Gaussian cox processes. Scand J Stat 25(3):451–482

    Article  Google Scholar 

  • Murray I, Prescott Adams R, MacKay DJ (2010) Elliptical slice sampling

  • Niemuth ND, Dahl AL, Estey ME, Loesch CR (2007) Representation of landcover along breeding bird survey routes in the northern plains. J Wildl Manag 71(7):2258–2265

    Article  Google Scholar 

  • Oliveira U, Brescovit AD, Santos AJ (2017) Sampling effort and species richness assessment: a case study on Brazilian spiders. Biodivers Conserv 26(6):1481–1493

    Article  Google Scholar 

  • Pati D, Reich BJ, Dunson DB (2011) Bayesian geostatistical modelling with informative sampling locations. Biometrika 98(1):35–48

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

  • Robinson OJ, Ruiz-Gutierrez V, Fink D (2018) Correcting for bias in distribution modelling for rare species using citizen science data. Divers Distrib 24(4):460–472

    Article  Google Scholar 

  • Rosenberg KV, Dokter AM, Blancher PJ, Sauer JR, Smith AC, Smith PA, Stanton JC, Panjabi A, Helft L, Parr M et al (2019) Decline of the North American avifauna. Science 366(6461):120–124

    Article  CAS  PubMed  Google Scholar 

  • Roy HE, Adriaens T, Isaac NJ, Kenis M, Onkelinx T, Martin GS, Brown PM, Hautier L, Poland R, Roy DB et al (2012) Invasive alien predator causes rapid declines of native European ladybirds. Divers Distrib 18(7):717–725

    Article  Google Scholar 

  • Ruete A (2015) Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance map. Biodivers Data J 3

  • Sicacha-Parada J, Steinsland I, Cretois B, Borgelt J (2020) Accounting for spatial varying sampling effort due to accessibility in citizen science data: a case study of moose in Norway. Spat Stat 42

  • Sullivan BL, Aycrigg JL, Barry JH, Bonney RE, Bruns N, Cooper CB, Damoulas T, Dhondt AA, Dietterich T, Farnsworth A et al (2014) The ebird enterprise: an integrated approach to development and application of citizen science. Biol Conserv 169:31–40

    Article  Google Scholar 

  • Szabo JK, Vesk PA, Baxter PW, Possingham HP (2010) Regional avian species declines estimated from volunteer-collected long-term data using list length analysis. Ecol Appl 20(8):2157–2169

    Article  PubMed  Google Scholar 

  • Szabo JK, Vesk PA, Baxter PW, Possingham HP (2011) Paying the extinction debt: woodland birds in the mount lofty ranges, South Australia. Emu-Austral Ornithol 111(1):59–70

    Article  Google Scholar 

  • Tiago P, Ceia-Hasse A, Marques TA, Capinha C, Pereira HM (2017) Spatial distribution of citizen science casuistic observations for different taxonomic groups. Sci Rep 7(1):1–9

    Article  CAS  Google Scholar 

  • van Strien AJ, van Swaay CA, Termaat T (2013) Opportunistic citizen science data of animal species produce reliable estimates of distribution trends if analysed with occupancy models. J Appl Ecol 50(6):1450–1458

    Article  Google Scholar 

  • Zaniewski AE, Lehmann A, Overton JM (2002) Predicting species spatial distributions using presence-only data: a case study of native New Zealand ferns. Ecol Model 157(2–3):261–280

    Article  Google Scholar 

  • Zhang H (2004) Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J Am Stat Assoc 99(465):250–261

    Article  Google Scholar 

Download references

Acknowledgements

For comments on the manuscript we thank Valentin Journe, Ruben Palacio, Renata Poulton Kamakura, Tong Qiu, Chantal Reid, C. Lane Scher, Shubhi Sharma, and Maggie Swift.

Funding

The project was funded by the National Science Foundation (NSF-DEB-1754443, NSF ICER/Belmont Forum Biodiversa), NASA (AIST18-0063), and the Programme d’Investissement d’Avenir under project FORBIC (18-MPGA-0004).

Author information

Authors and Affiliations

Authors

Contributions

J.S.C. conceived of the study, and A.E.G. and J.S.C. contributed to formulation of models. B.T. obtained the data and implemented the analyses. A.E.G. and B.T. drafted the paper with contributions from J.S.C. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Becky Tang.

Ethics declarations

Conflicts of interest

All authors declare that they have no conflict of interest.

Data availability

All data used in this study are available at https://github.com/beckytang/ebird_data.

Code availability

All data used in this study are available at https://github.com/beckytang/ebird_data.

Additional information

Handling Editor:Pierre R. L. Dutilleul

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1858 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, B., Clark, J.S. & Gelfand, A.E. Modeling spatially biased citizen science effort through the eBird database. Environ Ecol Stat 28, 609–630 (2021). https://doi.org/10.1007/s10651-021-00508-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10651-021-00508-1

Keywords

Navigation