Extrapolation in species distribution modelling. Application to Southern Ocean marine species

https://doi.org/10.1016/j.pocean.2020.102438Get rights and content

Highlights

  • Species distribution models are increasingly used in Southern Ocean studies.

  • Extrapolation in model predictions can be up to 78% of the total projection area.

  • Increasing the number of presence-only records decreases extrapolation.

  • Restricting the projections which physiological information decreases extrapolation.

  • Providing extrapolation uncertainty maps along with model outputs is essential.

Abstract

Species distribution modelling (SDM) has been increasingly applied to Southern Ocean case studies over the past decades, to map the distribution of species and highlight environmental settings driving species distribution. Predictive models have been commonly used for conservation purposes and supporting the delineation of marine protected areas, but model predictions are rarely associated with extrapolation uncertainty maps.

In this study, we used the Multivariate Environmental Similarity Surface (MESS) index to quantify model uncertainty associated to extrapolation. Considering the reference dataset of environmental conditions for which species presence-only records are modelled, extrapolation corresponds to the part of the projection area for which one environmental value at least falls outside of the reference dataset.

Six abundant and common sea star species of marine benthic communities of the Southern Ocean were used as case studies. Results show that up to 78% of the projection area is extrapolation, i.e. beyond conditions used for model calibration. Restricting the projection space by the known species ecological requirements (e.g. maximal depth, upper temperature tolerance) and increasing the size of presence datasets were proved efficient to reduce the proportion of extrapolation areas. We estimate that multiplying sampling effort by 2 or 3-fold should help reduce the proportion of extrapolation areas down to 10% in the six studied species.

Considering the unexpectedly high levels of extrapolation uncertainty measured in SDM predictions, we strongly recommend that studies report information related to the level of extrapolation. Waiting for improved datasets, adapting modelling methods and providing such uncertainy information in distribution modelling studies are a necessity to accurately interpret model outputs and their reliability.

Introduction

Among the broad array of analytical tools developed for marine ecology studies over the last two decades, Species Distribution Modelling (SDM) has been increasingly used (Peterson, 2001, Elith et al., 2006, Austin, 2007, Gobeyn et al., 2019) and applied to Southern Ocean pelagic (Pinkerton et al., 2010, Freer et al., 2019), benthic organisms (Loots et al., 2007, Pierrat et al., 2012, Basher and Costello, 2016, Xavier et al., 2016, Gallego et al., 2017, Guillaumot et al., 2018a, Guillaumot et al., 2018b, Fabri-Ruiz et al., 2019, Jerosch et al., 2019) and even marine mammals (Nachtsheim et al. 2017). SDM represents a complementary approach to individual-based modelling and eco-physiological experiments, quickly and synthetically identifying environmental correlates of species distribution (Brotons et al., 2012, Feng and Papeş, 2017, Feng et al., 2020). SDM is also used to define species distribution spatial range (Nori et al., 2011, Walsh and Hudiburg, 2018) and can be used as decision criteria for conservation purposes (Guisan et al., 2013, Marshall et al., 2014). For instance, it is currently used in proposals developed by national committees of the CCAMLR (Commission for the Conservation of Antarctic Marine Living Resources) to support the definition and delineation of marine protected areas (Ballard et al., 2012, CCAMLR report WG-FSA-15/64, 2020, Arthur et al., 2018).

Applying SDM to Southern Ocean case studies is particularly challenging due to major constraints and biases that may reduce modelling performance. As for many oceanographic studies, access to environmental data with high temporal and spatial resolutions is difficult (Davies et al., 2008, Robinson et al., 2011). Antarctic coastal areas, in particular, are rarely accessed and documented due to logistical constraints, access being for example impossible during the austral winter due to sea ice cover (De Broyer et al. 2014). The availability of species absence records is also a limiting factor to modelling performances and model calibrations (Brotons et al., 2004, Wisz and Guisan, 2009). Models are usually based on a limited number of presence-only records and limited number of sampling sites, which are both spatially aggregated in the vicinity of scientific stations, where access is frequent and datasets from different seasons, have been compiled over decades and even beyond (De Broyer et al., 2014, Guillaumot et al., 2018a, Fabri-Ruiz et al., 2019, Guillaumot et al., 2019).

When generating a SDM, the model is fit to data with a given range of value for each environmental descriptor (i.e. the calibration range). When transferring model predictions, a portion of the environment may cover additionnal conditions that are outside this calibration range: these are non-analog conditions and the model extrapolates (Randin et al., 2006, Williams and Jackson, 2007, Williams et al., 2007, Fitzpatrick and Hargrove, 2009, Owens et al., 2013, Yates et al., 2018). Considering the limited number of species presence-only records occupied by each marine benthic species, and the poor quality and precision of environmental descriptors available for modelling Southern Ocean species distributions (Guillaumot et al., 2018a, Fabri-Ruiz et al., 2019), a large proportion of cells might be expected to be extrapolations beyond the calibration range of the model.

The Multivariate Environmental Similarity Surface (MESS) approach analyses spatial extrapolation by extracting environmental values covered by presence-only records and estimates areas where environmental conditions are outside the range of conditions contained in the calibration area (Elith et al. 2010). The method considers that extrapolation occurs when at least one environmental descriptor value is outside the range of the environment envelop for model calibration (more details given in Appendix 4).

The MESS approach was initially used to determine the environmental barriers to the invasion of the cane toad in Australia, when facing new environments and under future conditions (Elith et al. 2010). Implemented in MaxEnt (Elith et al. 2011), MESS was subsequently used by several authors for defining the climatic limits to the colonisation of new environments by non-native species, such as the American bullfrog in Argentina (Nori et al. 2011), for studying contrasts between native and potential ecological niches like in the study of the spotted knapweed (Centaurea stoebe) (Broennimann et al. 2014), or for defining the limits to model transferability and predicting the distribution of trees under future environmental conditions (Walsh and Hudiburg 2018).

More recently, the MESS approach was used to define model uncertainties related to extrapolation (Escobar et al., 2015, Li et al., 2015, Cardador et al., 2016, Luizza et al., 2016, Iannella et al., 2017, Milanesi et al., 2017, Silva et al., 2019) and extrapolation areas where environmental conditions are non-analog to conditions of model calibration (Fitzpatrick and Hargrove, 2009, Anderson, 2013). Associating uncertainty information to model predictions has been acknowledged as a necessity for reliable interpretations of model predictions (Grimm and Berger, 2016, Yates et al., 2018). It is also a requirement for specifying the level of risk associated with predictions and evaluating whether uncertainty can be mitigated to improve model outcomes (Guisan et al. 2013).

This study addresses the importance of extrapolation and associated uncertainties in SDMs generated at broad spatial scale for Southern Ocean species: an analysis that is seldom performed although important to characterise model reliability. Using the case study of six abundant and common sea star species in marine benthic communities, objectives of this work are to evaluate the importance of extrapolation proportions in wide projection areas, and to provide some methodological clues to mitigate the effects of extrapolation and improve model accuracy.

Section snippets

Studied species and environmental descriptors

The distribution of six sea star species (Asteroidea: Echinodermata) was studied (Table 1). The six species, Acodontaster hodgsoni (Bell, 1908), Bathybiaster loripes (Sladen, 1889), Glabraster antarctica (Smith, 1876), Labidiaster annulatus Sladen, 1889, Odontaster validus Koehler, 1906 and Psilaster charcoti (Koehler, 1906) are abundant and common in benthic communities in the Southern Ocean. The biology, ecology and distribution of these species have been extensively studied and are

Extrapolation and the extent of projection areas

All generated SDMs are accurate and performant, with high AUC (AUC > 0.91), TSS (TSS > 0.559) and COR (COR > 0.68) values, low standard deviations and good percentages of correctly classified presence-only test data (77–90%) (Table 2). Descriptors that contribute the most to SDMs are depth (22–34%), minimum POC (6–21%), POC standard deviation (8–20%), mean ice cover depth (7–17%) and mixed layer depth (3–10%). Contrasts between species are in the respective percentage of contribution of these

Modelling performances and extrapolation

SDMs were generated for Southern Ocean sea star species, with contrasting distributions and different numbers of presence-only records available (Table 1, Appendix 1). Overall, species presence-only records are spatially concentrated in the most accessible and visited areas of the Southern Ocean. Most of the sea star samples were collected close to the coasts of the Western Antarctic Peninsula, the Ross Sea and sub-Antarctic Islands such as the Kerguelen Islands. Consequently, high spatial

Conclusions

This study shows that when modelling species distribution on broad scale areas, such as the Southern Ocean, important proportions of predicted distribution probabilities (suitable or not) are model extrapolations. This extrapolation uncertainty relies on the completeness of species sampling, and the definition of its occupied space to calibrate the model. Extrapolation occurs in areas where habitat suitability is unknown as no information on species presence or absence is provided.

Reducing

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by a “Fonds pour la formation à la Recherche dans l’Industrie et l’Agriculture” (FRIA) and “Bourse fondation de la mer” grants to C. Guillaumot.

This is contribution no. 46 to the vERSO project (www.versoproject.be), funded by the Belgian Science Policy Office (BELSPO, contract n°BR/132/A1/vERSO). Research was also financed by the “Refugia and Ecosystem Tolerance in the Southern Ocean” project (RECTO; BR/154/A1/RECTO) funded by the Belgian Science Policy Office (BELSPO),

References (109)

  • C. Guillaumot et al.

    Broad-scale species distribution models applied to data-poor areas

    Prog. Oceanogr.

    (2019)
  • C. Havermans et al.

    DNA barcoding reveals new insights into the diversity of Antarctic species of Orchomene sensu lato (Crustacea: Amphipoda: Lysianassoidea)

    Deep Sea Res. Part II

    (2011)
  • A. Lomba et al.

    Overcoming the rare species modelling paradox: a novel hierarchical framework applied to an Iberian endemic plant

    Biol. Conserv.

    (2010)
  • M. Marmion et al.

    The performance of state-of-the-art modelling techniques depends on geographical distribution of species

    Ecol. Model.

    (2009)
  • C.E. Marshall et al.

    Species distribution modelling to support marine conservation planning: the next steps

    Mar. Policy

    (2014)
  • H.L. Owens et al.

    Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas

    Ecol. Model.

    (2013)
  • M.H. Pinkerton et al.

    Spatial and seasonal distribution of adult Oithona similis in the Southern Ocean: predictions using boosted regression trees

    Deep Sea Res. Part I

    (2010)
  • D.R. Stockwell et al.

    Effects of sample size on accuracy of species distribution models

    Ecol. Model.

    (2002)
  • O. Allouche et al.

    Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS)

    J. Appl. Ecol.

    (2006)
  • R.P. Anderson et al.

    The effect of the extent of the study region on GIS models of species geographic distributions and estimates of niche evolution: preliminary tests with montane rodents (genus Nephelomys) in Venezuela

    J. Biogeogr.

    (2010)
  • R.P. Anderson

    A framework for using niche models to estimate impacts of climate change on species distributions

    Ann. N. Y. Acad. Sci.

    (2013)
  • M.B. Araújo et al.

    Standards for distribution models in biodiversity assessments

    Sci. Adv.

    (2019)
  • M. Barbet-Massin et al.

    Selecting pseudo-absences for species distribution models: how, where and how many?

    Methods Ecol. Evol.

    (2012)
  • Z. Basher et al.

    The past, present and future distribution of a deep-sea shrimp in the Southern Ocean

    PeerJ

    (2016)
  • C.M. Beale et al.

    Incorporating uncertainty in predictive species distribution modelling

    Philos. Trans. R. Soc. B: Biol. Sci.

    (2012)
  • F.T. Breiner et al.

    Overcoming limitations of modelling rare species by using ensembles of small models

    Methods Ecol. Evol.

    (2015)
  • O. Broennimann et al.

    Contrasting spatio-temporal climatic niche dynamics during the eastern and western invasions of spotted knapweed in North America

    J. Biogeogr.

    (2014)
  • L. Brotons et al.

    Presence-absence versus presence-only modelling methods for predicting bird habitat suitability

    Ecography

    (2004)
  • L. Brotons et al.

    Modeling bird species distribution change in fire prone Mediterranean landscapes: incorporating species dispersal and landscape dynamics

    Ecography

    (2012)
  • Brueggeman, P., 1998. Underwater Field Guide to Ross Island & McMurdo Sound, Antarctica. The National Science...
  • L. Cardador et al.

    Combining trade data and niche modelling improves predictions of the origin and distribution of non-native European populations of a globally invasive species

    J. Biogeogr.

    (2016)
  • CCAMLR report WG-FSA-15/64, access at https://www.ccamlr.org/fr/wg-fsa-15/64. August...
  • B. Crase et al.

    A new method for dealing with residual spatial autocorrelation in species distribution models

    Ecography

    (2012)
  • C. De Broyer et al.

    Biogeographic atlas of the Southern Ocean

    (2014)
  • M. De Villiers et al.

    Combining field phenological observations with distribution data to model the potential distribution of the fruit fly Ceratitis rosa Karsch (Diptera: Tephritidae)

    Bull. Entomol. Res.

    (2013)
  • M.S. Dhingra et al.

    Global mapping of highly pathogenic avian influenza H5N1 and H5Nx clade 2.3. 4.4 viruses with spatial cross-validation

    Elife

    (2016)
  • A. El-Gabbas et al.

    Wrong, but useful: regional species distribution models may not be improved by range-wide data under biased sampling

    Ecol. Evol.

    (2018)
  • J. Elith et al.

    Novel methods improve prediction of species’ distributions from occurrence data

    Ecography

    (2006)
  • J. Elith et al.

    A working guide to boosted regression trees

    J. Anim. Ecol.

    (2008)
  • J. Elith et al.

    The art of modelling range-shifting species

    Methods Ecol. Evol.

    (2010)
  • J. Elith et al.

    A statistical explanation of MaxEnt for ecologists

    Divers. Distrib.

    (2011)
  • S. Fabri-Ruiz et al.

    Can we generate robust species distribution models at the scale of the Southern Ocean?

    Divers. Distrib.

    (2019)
  • S. Fabri-Ruiz et al.

    Benthic ecoregionalization based on echinoid fauna of the Southern Ocean supports current proposals of Antarctic Marine Protected Areas under IPCC scenarios of climate change

    Glob. Change Biol.

    (2020)
  • K.J. Feeley et al.

    Keep collecting: accurate species distribution modelling requires more collections than previously thought

    Divers. Distrib.

    (2011)
  • X. Feng et al.

    Can incomplete knowledge of species’ physiology facilitate ecological niche modelling? A case study with virtual species

    Divers. Distrib.

    (2017)
  • X. Feng et al.

    Physiology in ecological niche modeling: using zebra mussel's upper thermal tolerance to refine model predictions through Bayesian analysis

    Ecography

    (2020)
  • A.H. Fielding et al.

    A review of methods for the assessment of prediction errors in conservation presence/absence models

    Environ. Conserv.

    (1997)
  • M.C. Fitzpatrick et al.

    The projection of species distribution models and the problem of non-analog climate

    Biodivers. Conserv.

    (2009)
  • J.J. Freer et al.

    Predicting future distributions of lanternfish, a significant ecological resource within the Southern Ocean

    Divers. Distrib.

    (2019)
  • R. Gallego et al.

    On the need to consider multiphasic sensitivity of marine organisms to climate change: A case study of the Antarctic acorn barnacle

    J. Biogeogr.

    (2017)
  • Cited by (15)

    • Low vulnerability of the Mediterranean antipatharian Antipathella subpinnata (Ellis & Solander, 1786) to ocean warming

      2023, Ecological Modelling
      Citation Excerpt :

      The proportion of correctly classified test data also showed high values for the four groups of occurrence data used for spatial cross-validation, with 98.42 ± 2.49% (mean ± se) for group 1, 98.19 ± 2.81% for group 2, 99.60 ± 1.47% for group 3 and 96.69 ± 4.04% for group 4. The proportion of extrapolated area was 65.86% (Fig. 3), supporting the use of the MESS method (Guillaumot et al., 2019, 2020a). In addition to the occurrence data of A. subpinnata in the Tyrrhenian, Ligurian, Adriatic, Ionian and Aegean Seas (Fig. S1), the model projected very high presence probabilities in the Alboran Sea (South coast of Spain and North coast of Morocco), along the Algerian coast, around the Balearic Islands, along the East coast of the Adriatic Sea (Coasts of Croatia, Bosnia and Herzegovina, Montenegro, Albania) and in the South coast of Greece, near Athens (Fig. 3).

    View all citing articles on Scopus
    View full text