Temporal trends in opportunistic citizen science reports across multiple taxa

Knape, Jonas; Coulson, Stephen James; van der Wal, René; Arlt, Debora

doi:10.1007/s13280-021-01550-w

Temporal trends in opportunistic citizen science reports across multiple taxa

Research Article
Open access
Published: 29 March 2021

Volume 51, pages 183–198, (2022)
Cite this article

Download PDF

You have full access to this open access article

Ambio Aims and scope Submit manuscript

Temporal trends in opportunistic citizen science reports across multiple taxa

Download PDF

Jonas Knape ORCID: orcid.org/0000-0002-8012-5131¹,
Stephen James Coulson^2,3,
René van der Wal¹ &
…
Debora Arlt^1,2

3143 Accesses
11 Citations
2 Altmetric
Explore all metrics

Abstract

Opportunistic reporting of species observations to online platforms provide one of the most extensive sources of information about the distribution and status of organisms in the wild. The lack of a clear sampling design, and changes in reporting over time, leads to challenges when analysing these data for temporal change in organisms. To better understand temporal changes in reporting, we use records submitted to an online platform in Sweden (Artportalen), currently containing 80 million records. Focussing on five taxonomic groups, fungi, plants, beetles, butterflies and birds, we decompose change in reporting into long-term and seasonal trends, and effects of weekdays, holidays and weather variables. The large surge in number of records since the launch of the, initially taxa-specific, portals is accompanied by non-trivial long-term and seasonal changes that differ between the taxonomic groups and are likely due to changes in, and differences between, the user communities and observer behaviour.

Decision-making of citizen scientists when recording species observations

Article Open access 30 June 2022

Diana E. Bowler, Netra Bhandari, … Aletta Bonn

Nature’s untold stories: an overview on the availability and type of on-line data on long-term biodiversity monitoring

Article 20 June 2018

Stefania Ondei, Barry W. Brook & Jessie C. Buettel

Taxonomic bias in biodiversity data and societal preferences

Article Open access 22 August 2017

Julien Troudet, Philippe Grandcolas, … Frédéric Legendre

Introduction

Online portals through which volunteers submit reports of observations of wildlife gather large amounts of data worldwide. The majority of such species occurrence data are collected opportunistically with little or no underlying sampling design, but often with much higher temporal and spatial resolution compared to designed studies (Ruete et al. 2017). The wide availability and cost-effectiveness of opportunistic data, and the public engagement benefits they can provide, leads to stakeholder interest from governmental agencies, non-governmental organisations, universities and research institutes. For instance, governmental agency staff may use the data for infrastructure development planning or red-listing of species (Maes et al. 2015), and researchers may be interested in distributional range shifts (Prieto-Torres et al. 2020).

The opportunistic nature of the data leads to large variability in reporting that is due to a largely unknown mix of variation that can be ascribed to observer behaviour (August et al. 2020) and variation due to ecological processes. This variation poses challenges when data are used for inference about populations, and also leads to lack of trust in inferences (Burgess et al. 2017). To gain a solid understanding of how variation in effort affects inference about ecological processes, and to be able to incorporate variation in effort in statistical models to correct for bias (Altwegg and Nichols 2019), we need to first understand the nature of variation in reporting. Previous studies have looked at how reporting varies spatially with anthropogenic variables such as road access (Mair and Ruete 2016; Zhang 2020) and human population density (van der Wal et al. 2015), and among observers (August et al. 2020). Few studies have investigated the details of temporal variation (but see Otegui et al. 2013; Zhang 2020). Reporting is, however, expected to vary in time due to variation in availability of organisms over seasons and years; with factors affecting observer behaviour and movements, including weekends (Żmihorski et al. 2012), holidays and weather; and with factors affecting observer reporting behaviour such as database reporting gaining in popularity (Amano et al. 2016), targeted efforts to increase reporting engagement (Sullivan et al. 2014), changes in reporting interfaces and functionality (August et al. 2015), and with changes in the community of active users. Temporal variation in effort arising from changes in observation and reporting behaviour need to be considered when making inferences about ecological processes in time, such as when estimating trends in occurrence or abundance, range shifts and trends in spatial distribution, and seasonal patterns (phenology) or their long-term trends. Studies using opportunistic citizen science data to estimate temporal trends have stressed the importance of correcting estimates for variation in effort, but results of attempts to validate estimated trends against trends derived from studies with stronger designs have been mixed (Snäll et al. 2011; van Strien et al. 2013; Kamp et al. 2016; Boersch-Supan et al. 2019). A better understanding of reporting patterns might eventually shed some light on when correction attempts may be successful, how to do them, and what might be done to the reporting system to increase the usefulness of data. For instance, knowledge of variation in reporting may be used to inform simulations manipulating specific mechanisms believed to cause bias and then checking the effects of those mechanisms on trend estimates (Isaac et al. 2014). Some methods to correct for observer bias also rely directly on an understanding of variables influencing variation in effort (Johnston et al. 2020).

Our aim in this study is to investigate broad temporal patterns in reporting of birds, butterflies, beetles, vascular plants and fungi to the Swedish Species Observation System (Artportalen; Shah and Coulson 2021). We examine patterns at a daily resolution to understand how reporting has changed during the period from 2000, when online reporting for birds was first launched, to 2018. Specifically, we decompose change in reporting into long-term and seasonal patterns, effects of weekdays and holidays, and simple weather variables, and compare these patterns among the taxonomic groups.

Materials and Methods

Data

Response variables

The Swedish Species Observation System (Artportalen; https://www.artportalen.se/) is a web portal and database to which the public can submit reports of species observations across taxa from plants to animals, covering all multicellular taxa, currently holding 80 million records. The main reporting user group is the general public, but the system is also integrated with the authorities reporting of survey-based biodiversity data.

The data are largely ‘presence-only’, i.e., records of what has been observed, not of what has not been observed (Gelfand and Shirota 2019; an option for checklist reporting has, however, recently been added to the system). There is no requirement on adhering to any specific sampling design: observers choose which species to report, where to observe them, when, and what amount of time and effort to devote. These aspects of the data imply that the vast majority of data can be called ‘opportunistic’. However, some data from systematic surveys, with varying degrees of standardisation, are also submitted.

Since the launch of the web-based reporting system in 2000 several substantial changes were made, including adding platforms for different organism groups, and merging separate platforms into one single unified platform (Table 1). Here we make use of Artportalen’s 20-year history with known changes, many well-recorded species groups with high temporal density of data, and opportunistic nature of a majority of observations to investigate temporal patterns in recording.

Table 1 Major changes to what is today the Swedish Species Observation System, Artportalen, artportalen.se

Full size table

The data consist of records of observations of at least one individual of a single species from a single location by an observer at some point in time. There may be multiple records of single and/or multiple species at the same place and time by the same observer. In some cases there are repeat records of the same individual(s) of a species from multiple observers. Using the Swedish LifeWatch Analysis Portal (Leidenberger et al. 2016), we extracted all records between 2000 and 2018 of species from five selected species groups that had a large number of records compared to most other groups: birds (45 million records), butterflies (0.9 million), beetles (order Coleoptera; 0.5 million), vascular plants (division Tracheophyta; 4 million), and fungi (division Basidiomycota; 1.2 million). We removed observations with insufficient temporal resolution (recording time exceeding 1 day) and with uncertain species identification.

We computed two sets of response variables for each species at a daily resolution. The first set aims to explore how the number of records has changed over time and consists of three response variables: the total number of records each day, the total number of unique observers each day, and the number of records per observer each day. The total number of records serves as a measure of how the amount of data collected has changed over time. The total number of observers and the number of records per observer, provides additional information about how this change has come about.

The second set of response variables aims to explore whether there is variation over time in the locations from which observations come. We used locations of ‘species lists’, defined as a set of observations of different species made by the same observer on the same day and at the same locality (same geographical coordinates) (Szabo et al. 2010), instead of locations of individual records to avoid repeating the spatial variable over multiple species records reported by an observer from a single location. From the locations of all lists within a day, we computed the mean of two spatial response variables: latitude and human population density. Via latitude, we investigated whether there may be temporary or permanent shifts in volume of observations towards more southern or northern locations over time, i.e. different parts of the country that differ in reporting effort. Using human population density we investigated how the proportion of records from highly populated versus more sparsely populated areas varied over time. We aggregated human population density at a 10 × 10 km resolution by summing population sizes from raster data at higher resolution (SCB Statistics Sweden; https://www.scb.se/vara-tjanster/oppna-data/oppna-geodata/statistik-pa-rutor/). We subsequently log(x + 1) transformed human population density at the location of species lists before computing the spatial average to not give exessive weight to lists from the larger cities (large skewness of the distribution of population sizes at the arithmetic scale; Mair and Ruete 2016). Human population distribution has not gone through any major changes during the study period, justifying the use of a snapshot map of population size.

Weather

We extracted mean daily temperature and total precipitation across 0.25° grid squares covering the whole of Sweden (Cornes et al. 2018). Temperature has a strong seasonal and spatial pattern, and to reduce confounding with a general seasonal pattern in reporting we computed a detrended temperature variable. Specifically, we fitted a cyclic smooth seasonal curve to the time series of daily temperature from each grid square. The daily residuals from these curves were averaged across all grid squares to compute a daily temperature deviation index, which was used in analyses of reporting data. For precipitation, seasonal patterns are weaker, and we therefore used daily average precipitation across all grid points without detrending. As these variables are averages across the country, they will tend to reflect large-scale weather events rather than local weather.

Models

We analysed temporal patterns in the data using generalised additive models (GAMs). We denote the response variable y_t, where t is the number of days since January 1 in year 2000. The basic structure of the model was:

$$y_{t} \,\sim \,s\left( t \right)\, + \,s\left( {doy} \right)\, + \,dow\, + \,holiday\, + \,c_{temp} \left( {doy} \right) \, \cdot \, temp\, + \,c_{rain} \left( {doy} \right) \, \cdot \, rain$$

Here, s(t) is a smooth function representing slow, long-term, changes in reporting over time; s(doy) is a smooth function of the day of year (doy), representing seasonal patterns; dow is a fixed factor with a level for each day of the week; holiday is a fixed factor with 15 levels (14 public holidays over all years, plus an extra level for no holiday); c_temp(doy) and c_rain(doy) are coefficients for the effects of temperature deviation and rainfall, which are allowed to vary over the seasons.

We modelled the long-term trend s(t) using a thin plate spline with 10 degrees of freedom, and the seasonal smooth s(doy) using a thin plate spline with 40 degrees of freedom. For the seasonal smooth, we did not use a cyclic spline (i.e. with matching levels at the start and end of a year), despite that it may seem an obvious choice, because we expected higher reporting in the beginning than in the end of a year with a resulting possible discontinuity in the response variables with the change from one year to the next. This is mainly relevant for birds, for which many observers tally the number of species seen in a calendar year (Hui 2013), but we kept the same structure for all species groups. For the coefficient functions c_temp(doy) and c_rain(doy) we used cyclic cubic regression splines with 40 degrees of freedom. To investigate seasonal changes in the effect of weekends we allowed the seasonal pattern s(doy) to differ between weekends and weekdays.

In a separate model fit, we added an interaction term between seasonal variation and long-term trends to investigate whether there were indications of a difference in the long-term trends among different parts of the season. For this, we used a tensor product interaction based on a thin plate spline with 10 degrees of freedom for the long-term component and 6 degrees of freedom for the seasonal component. We held the degrees of freedom for the seasonal component low in the interaction to reduce the risk of identifiability issues between the long-term and seasonal components. In these separate model fits we also added an interaction term between long-term trends and weekend effects.

When the response was the number of records or the number of observers we used a log link and a negative binomial distribution with a quadratic variance-mean scaling. To model the average number of records per observer, we used the number of records subtracted by the number of observers as the response under a negative binomial distribution with the (log-transformed) number of observers as an offset. The number of observers was subtracted in the response variable because the number of records is always at least as large as the number of observers. This response will therefore always be positive and can attain the value zero, as assumed by the negative binomial model. For the spatial response variables average latitude and average log human population density we used a Gaussian response distribution. Since these responses were computed as averages, their variance will differ depending on the number of data points underlying the average. We therefore used the number of species lists as weights for the spatial response variables.

We tried to account for residual autocorrelation by first fitting the above model assuming no autocorrelation. From that fit we computed the empirical lag 1 autocorrelation of the residuals, which was then set as a fixed value in a final (second) fit of the model. Data for 2018 were withheld from model fitting and used to visually assess the predictive performance of the model. We fitted models using the functions gam and bam in the R package mgcv (Wood 2006).

Results

Raw data for the per day, number of records, observers and records per observer for all species groups in 2018 are shown in Fig. 1. Forecasts for those numbers captured the main seasonal patterns for most groups, but overestimated the number of records of fungi and butterflies during the peak period in 2018.

Long-term trends

The number of records and the number of observers increased approximately three to fourfold (beetles, vascular plants, fungi) or more (birds, butterflies) over the study period (Fig. 2). Long-term patterns for the number of observers were qualitatively similar to the patterns for the number of records for most species groups (Fig. 2). Despite this, there were clear differences in patterns for the number of records per observer—the fraction of the previous two responses.

The number of birds recorded per observer increased over the study period (Fig. 2). This increase happened mainly in the first years after the launch of the online portal. For the other groups, the number of records per observer decreased during the later part of the study period. Thus, the increase in the total number of records for these groups was due to an increase in number of observers, and happened despite observers on average submitting fewer records.

The average latitude of locations of species lists was largely stable across years for birds (Fig. 3), but lists tended to come from more densely populated areas during the latter part of the study period, due mainly to an increase in the first few years (Fig. 3). There was an increase over time in the mean latitude of lists for butterflies and vascular plants, and the number of lists from more densely populated areas for fungi. Other patterns for latitude and population size were more complex or less evident (Fig. 3)

A substantial change was made to the reporting systems between 2013 and 2015 when the taxa-specific platforms were merged, in stages, into a joint system (Table 1). Following this change the number of observers and records generally increased, except perhaps for birds, while records per observer decreased for beetles, vascular plants and fungi (Fig. 2). For these groups, lists came from on average more densely populated areas after the merger.

Within-year patterns

Seasonal

Each species group revealed a somewhat different pattern of seasonality: multimodality in bird reports (records, observers, records per observer); mostly peaked curves for reports of other groups, but at different times of the year; with a plateaued curve for beetle observers; vascular plant observers showing abrupt transitions between small and large numbers; and beetle records per observer with multimodality (Fig. 4).

The number of records and the number of observers for birds showed a complex pattern with a sharp peak immediately after New Year, as well as broader peaks during the migration periods in notably spring but also autumn. For all groups the average number of records per observer tended to peak at approximately the same time of the year as the total number of records, except for beetles for which there was no clear peak (Fig. 4).

Most groups showed strong seasonal patterns in the average location of lists (Fig. 5). Generally, lists on average came from less densely populated and more northerly located areas during summer than during winter. Here too, bird reports had the most complex seasonal pattern (Fig. 5).

Weekends

The number of records and number of observers tended to be higher on weekends for all species groups (Fig. 4). This effect was most pronounced for birds, with the number of records increasing by a factor of 2 or more during some periods of the year, and for beetles. For birds, the difference between weekends and weekdays was larger during all seasons outside the summer holiday period (late June and July), but we found no clear evidence for seasonally changing strength of weekend effects for the other species groups. For birds, also the number of records per observer was higher on weekends and holidays than during weekdays. Weekend effects on list locations were mainly small, but bird lists in spring and autumn tended to come from less populated areas, and lists of fungi from lower latitudes during weekends (Fig. 5).

Holidays

Holiday effects were mostly positive and strongest for birds with almost double the expected number of records for some public holidays in spring (Fig. 6). New Year’s Day had the strongest holiday effect, and often had the most bird reports among all days of the year, despite the number of bird species present in Sweden being considerably lower in winter. There were potential holiday effects also for beetles, butterflies and vascular plants, but these signals were weaker and less certain. There were no clear holiday effects for fungi, but there are few public holidays in autumn in Sweden when most fungi are reported. We found no strong holiday effects on average list locations, although there was some indication of bird reports coming from less populated areas on some spring holidays (Fig. S1).