Original Articles
Is prediction of species richness from stacked species distribution models biased by habitat saturation?

https://doi.org/10.1016/j.ecolind.2019.105970Get rights and content

Highlights

  • Habitat saturation impacts predictions from Species Distribution Models (SDM)

  • Habitat saturation biases stacked SDMs (S-SDMs) predictions.

  • Probability-based richness predicts local SR without bias whatever habitat saturation.

  • Comparing different S-SDMs predictions can shed light on community assembly processes.

Abstract

Several studies have proposed to predict Species Richness (SR) by combining the predictions of independent Species Distributions Models (SDMs) (the predict first-assemble later strategy). Alternative methods propose to combine outputs from SDMs differently, by either summing predicted presence probabilities at each location, or summing binary presence predictions after thresholding the probabilities. Species can occupy various proportions of their suitable habitats (i.e, have various levels of habitat saturation), which can cause discrepancy when predicting their presences through SDMs. Furthermore, these discrepancies can be increased when combining the predictions of individual SDMs to predict SR. In this article, we performed simulations of species distributions with varying habitat saturation (i.e., the amount of suitable habitat occupied by a species), and we compared observed richness with that predicted by the alternative approaches. We found that probability-based richness is not biased by the level of habitat saturation, while threshold-based richness over-predicts richness at low habitat saturation and under-predicts it as high habitat saturation. Probability-based richness should thus be used in priority when predicting species richness locally. Nonetheless, threshold-based richness represents species richness constrained by environmental filtering only and thus is a useful indicator of potential species richness when species fully saturate their habitats. Thus the systematic comparison of probability-based and threshold-based richness predictions can reveal the importance of habitat saturation and can thus help identify community assembly mechanisms at play.

Introduction

Species Richness is an Essential Biodiversity Variable (EBV) (Pereira et al., 2013), which should be assessed, monitored and compared across space, time, and ecological contexts. Different models have been proposed for richness prediction in diverse ecological contexts and at large spatial scale (Dodson, 1992, Graham and Hijmans, 2006, O’Brien, 1998), with the perspective of identifying biodiversity hotspots (Mazel et al., 2014, Myers et al., 2000), targeting effective management practices (Chown et al., 2003), quantifying biodiversity changes (Newbold et al., 2015) and predicting ecosystem functioning (Cardinale et al., 2012).

Several methods can be used to predict richness depending on which ecological processes are at play. For example, Macro-Ecological Models (MEMs) directly predict richness at any location as a function of local environmental variables. These models consider the influence of environmental filtering and energy limits on richness (Hurlbert and Stegen, 2014). Because site-species data are first aggregated to estimate richness and then used to predict the variation with the environment, these approaches are called ‘assemble first, predict later’ (Ferrier and Guisan, 2006). Conversely, more and more global and local biodiversity databases include species occurrences instead of local assemblage composition (GBIF, 2019, Sullivan et al., 2009, Tedesco et al., 2017). An alternative approach has been to first model occurrences, independently for each species, at any location using environmental variables through species distribution models (SDMs) (Guisan and Thuiller, 2005, Guisan and Zimmermann, 2000), then to deduce potential local richness by combining (=stacking) the predictions of individual SDMs (Calabrese et al., 2014, D’Amen et al., 2015b, Gavish et al., 2017, Scherrer et al., 2018, Schmitt et al., 2017), which is known as the ‘predict first, assemble later’ approach (Ferrier and Guisan, 2006). When stacking SDMs, each SDM predicts occurrences for species independently using environmental variables (Guisan and Zimmermann, 2000). Then, predictions of SDMs for different species are summed to predict richness at assemblage-level. Stacked-SDMs (S-SDMs) predict observed richness as well as or better than macro-ecological models (Dubuis et al., 2011, Guisan and Rahbek, 2011), but there is still no consensus on the stacking method to be used so as to reliably predict richness with S-SDMs (Scherrer et al., 2018).

Two main methods exist to stack SDMs (Dubuis et al., 2011, Pineda and Lobo, 2009, Scherrer et al., 2018). Some authors suggested using thresholds to convert probabilities to binary predictions (presence and absence) (Jim’enez-Valverde and Lobo, 2007, Liu et al., 2005). These binary predictions are then summed to predict richness at local scale (hereafter threshold-based richness). One of the main arguments for conversion of probabilities provided by SDMs to binary predictions is that most of practical applications need binary maps (Jim’enez-Valverde and Lobo, 2007). A caveat of binary predictions is that they translate continuous responses of species along environmental gradients into binary responses, which imply more abrupt shifts from presence to absence between suitable and unsuitable conditions (Meynard and Kaplan, 2012). When predicted probabilities are under the threshold, the model only predicts absences, while it only predicts presences when predicted probabilities are above it. Close to the threshold value, a small change in predicted presence probability can change the binary prediction from absence to presence. Meynard and Kaplan (2012) showed that presence predictions using thresholds fit observed presences only when species has a threshold-like response, while error increases when a species response is more gradual. The more species considered that have a gradual response along the environment, the greater the error when predicting richness. SDMs also directly provide continuous presence probabilities as outputs (Guisan and Thuiller, 2005), and threshold conversion to binary predictions adds a step compared to the direct sum of individual model predictions. Summing the probabilities of individual species model provides the mathematical expectation of the number of species locally present, assuming that species occurrences are independent (Calabrese et al., 2014, Violle et al., 2011), hereafter called probability-based richness.

A basic implicit assumption of SDMs is that only environmental conditions determine species occurrence, depending on a species fundamental niche (Guisan and Zimmermann, 2000). Additional processes should affect the realized occupancy patterns, such as dispersal limitation, competitive exclusion, local extinction dynamics (Pulliam, 2000). SDM predictions and thus richness predictions are likely to be biased by neglecting the contribution of processes shaping realized species distributions beyond their fundamental niche requirements (Václavík and Meentemeyer, 2012), thereby affecting SDM predictions and thus richness predictions. For instance, due to source-sink dynamics, some species can occupy less suitable sites, and thus be distributed outside the suitable habitat delimited based on presence probabilities predicted by SDMs. In addition, a species that is less often present across its suitable habitat would have a lower predicted presence probability than a species that is present in all its suitable habitats, even though the predicted binary distribution of an SDM would be the same. We define habitat saturation of a species as a parameter that affects species occurrence probability based on environmental suitability. Here saturation is a species-level property and not an upper bound for richness in assemblages as proposed by Mateo et al. (2017). When species display low habitat saturation, their realized presence probabilities decrease, so that the predicted summed probability gets lower. On the contrary, the threshold-based presence prediction is not affected, by habitat saturation. Indeed, even if the determined species threshold changes with habitat saturation, the prediction will still be binary (presence or absence) (Meynard and Kaplan, 2012), thus we expect to observe increasing difference between threshold-based and observed richness with lower (or higher) habitat saturation.

Predicted presence probabilities partly reflect the ability of species to saturate their niche. Therefore, we expect probability-based richness to best predict actual richness. While we expect threshold-based richness to over-predict actual richness. Threshold-based richness rather represents a pool of species able to occur in given environmental conditions. To test these expectations we simulated virtual species with varying saturation and niche requirements (Hirzel et al., 2001, Meynard et al., 2019). We performed S-SDMs to predict richness given environmental conditions using both threshold- and probability-based richness and compared how the predictions were affected by habitat saturation. Probability-based richness followed observed richness whatever the habitat saturation, while threshold-based richness only matched observed richness when habitat saturation was 100%. Threshold-based richness only considered the environmental requirements of species, and could thus be used as the prediction of potential richness based solely on local environmental conditions. Potential richness could then be compared with other richness predictions that incorporate other ecological processes.

Section snippets

Species assemblage simulations

Individual species simulation. We simulated a linear environmental gradient of 2000 values, from 1 to 2000. We then used the virtual species package version 1.4–2 (Leroy et al., 2016) to define 100 species independently, with quadratic environmental response si,k=a×Envk2+b×Envk, with si,k the environmental suitability of species i in assemblage k and Envk the environmental variable. a was drawn from a uniform distribution between −20 and −0.01. b was chosen as b=-m2a where m was drawn from a

Results

Binary predictions (solid segments above and below the plot) showed few differences whatever habitat saturation (Fig. 2). There were the same from environment 1–273, then between environment 467 and 1514, and for environments greater than 1720. In total binary predictions were the same whatever habitat saturation for over 80% of the environmental values. However, binary predictions changed abruptly from absences to presences and from presences to absences for environment close to 500 and to

Discussion

We designed a virtual experiment of species occurrences along an environmental gradient and performed binomial GLM-based species distribution modeling on these data. The binary threshold-based presence prediction represented the potential habitat of each species based on its fundamental niche (Guisan and Zimmermann, 2000), whatever its actual habitat saturation. On the contrary, the range and average values of predicted presence probabilities depended on habitat saturation, for a given

CRediT authorship contribution statement

Matthias Grenié: Conceptualization, Methodology, Software, Formal analysis, Writing - original draft, Writing - review & editing. Cyrille Violle: Conceptualization, Supervision, Writing - review & editing. François Munoz: Conceptualization, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We would like to thank Pierre Denelle and Christine Meynard for helpful discussions. MG was supported by the ENS de Lyon. This study was supported by the European Research Council (ERC) Starting Grant Project ‘ecophysiological and biophysical constraints on domestication in crop plants’ (grant ERC-StG-2014-639706-CONSTRAINTS) and by the French Foundation for Research on Biodiversity (FRB; <www.fondationbiodiversite.fr>) in the context of the CESAB project ‘causes and consequences of functional

References (70)

  • V. Boucher-Lalonde et al.

    A consistent occupancy Climate relationship across birds and mammals of the Americas

    Oikos

    (2014)
  • V. Boucher-Lalonde et al.

    How are tree species distributed in climatic space? A simple and general pattern

    Glob. Ecol. Biogeogr.

    (2012)
  • J.H. Brown

    On the relationship between abundance and distribution of species

    Am. Nat.

    (1984)
  • J.M. Calabrese et al.

    Stacking species distribution models and adjusting bias by linking them to macroecological models: stacking species distribution models

    Glob. Ecol. Biogeogr.

    (2014)
  • B.J. Cardinale et al.

    Biodiversity loss and its impact on humanity

    Nature

    (2012)
  • D.W. Carstensen et al.

    Introducing the biogeographic species pool

    Ecography

    (2013)
  • J.M. Chase et al.

    Disentangling the importance of ecological niches from stochastic processes across scales

    Philos. Trans. R. Soc. B: Biol. Sci.

    (2011)
  • S.L. Chown et al.

    Energy, species richness, and human population size: conservation implications at a national scale

    Ecol. Appl.

    (2003)
  • M. D’Amen et al.

    Using species richness and functional traits predictions to constrain assemblage predictions from stacked species distribution models

    J. Biogeogr.

    (2015)
  • M. D’Amen et al.

    Predicting richness and composition in mountain insect communities at high resolution: a new test of the SESAM framework

    Glob. Ecol. Biogeogr.

    (2015)
  • F. de Bello et al.

    Functional species pool framework to test for biotic effects on community assembly

    Ecology

    (2012)
  • S. Dodson

    Predicting crustacean zooplankton species richness

    Limnol. Oceanogr.

    (1992)
  • A. Dubuis et al.

    Predicting spatial patterns of plant species richness: a comparison of direct macroecological and species stacking modelling approaches

    Divers. Distrib.

    (2011)
  • O. Eriksson

    Regional dynamics of plants: a review of evidence for remnant, source-sink and metapopulations

    Oikos

    (1996)
  • S. Ferrier et al.

    Spatial modelling of biodiversity at the community level

    J. Appl. Ecol.

    (2006)
  • Y. Gavish et al.

    Accounting for biotic interactions through alpha-diversity constraints in stacked species distribution models

    Methods Ecol. Evol.

    (2017)
  • GBIF, 2019. What is...
  • C.H. Graham et al.

    A comparison of methods for mapping species ranges and species richness

    Glob. Ecol. Biogeogr.

    (2006)
  • A. Guisan et al.

    SESAM - a new framework integrating macroecological and species distribution models for predicting spatio-temporal patterns of species assemblages: predicting spatio-temporal patterns of species assemblages

    J. Biogeogr.

    (2011)
  • A. Guisan et al.

    Predicting species distribution: offering more than simple habitat models

    Ecol. Lett.

    (2005)
  • S.P. Hubbell

    The Unified Neutral Theory of Biodiversity and Biogeography, Monographs in Population Biology

    (2001)
  • A.H. Hurlbert et al.

    When should species richness be energy limited, and how would we know?

    Ecol. Lett.

    (2014)
  • G.E. Hutchinson

    Concluding remarks

    Cold Spring Harb. Symp. Quant. Biol.

    (1957)
  • P. Keil et al.

    Downscaling of species distribution models: a hierarchical approach

    J. Appl. Ecol.

    (2013)
  • B. Leroy et al.

    Virtualspecies, an R package to generate virtual species distributions

    Ecography

    (2016)
  • Cited by (19)

    • Impacts of trophic interactions on the prediction of spatio-temporal distribution of mid-trophic level fishes

      2022, Ecological Indicators
      Citation Excerpt :

      This aggregation approach is able to avoid overfitting and/or biased estimation of species distributions (Schmitt et al., 2017; Z. Zhang et al., 2020a). In addition, ESDM outputs for single species can be stacked together to generate a composite map of multi-species habitat suitability (Schmitt et al., 2017), which is referred as stacked species distribution model (SSDM) (Grenié et al., 2020; Rosner-Katz et al., 2020). SSDM provides an effective method to predict habitat suitability at multi-species level, and provides essential information for conservation strategies (Robinson et al., 2011; Rosner-Katz et al., 2020; Z. Zhang et al., 2020a; Zurell et al., 2020).

    • High uncertainty in the effects of data characteristics on the performance of species distribution models

      2021, Ecological Indicators
      Citation Excerpt :

      Several ecological traits are known to affect SDM results, including body size (França and Cabral, 2016; Zamorano et al., 2019), life span (Hanspach et al., 2010; McCune et al., 2020), growth rate (Guisan et al., 2007), habitat specialization (Marshall et al., 2015; Regos et al., 2019) or dispersal ability (McCune et al., 2020). But the accuracy of SDM predictions is also determined by species’ eco-geographic characteristics such as geographic range size (Chefaoui et al., 2011; Guo et al., 2015; Wogan, 2016), rarity (Franklin et al., 2009), the marginality of their occurrences with respect to the environmental conditions of the study region (Gábor et al., 2020; Jiménez-Valverde et al., 2008; Lobo, 2008), or the proportion of their potential distributions that is effectively occupied by species’ populations (Grenié et al., 2020). In general, SDM performance seems to be poorer in species with higher mobility, broader niche breadth and wide distribution ranges (Guo et al., 2015; Hortal et al., 2008; Newbold et al., 2009; Tessarolo et al., 2014).

    View all citing articles on Scopus
    View full text