Elsevier

Geoderma Regional

Volume 24, March 2021, e00353
Geoderma Regional

Harvesting spatially dense legacy soil datasets for digital soil mapping of available water capacity in Southern France

https://doi.org/10.1016/j.geodrs.2020.e00353Get rights and content

Highlights

  • We used dense samplings of legacy soil observations to map Soil Available Water Capacity (SAWC) over an irrigated perimeter.

  • The best results were obtained with Quantile Random Forest (QRF) using geographical locations as additional soil covariates

  • Feeding QRF with 2781 additional auger hole observations greatly increased the accuracy and resolution of the SAWC map

  • QRF performances were however limited by the observational uncertainties of the SAWCs determined from auger holes

  • Extending SAWC maps to the whole irrigated perimeter would need to automatize the digitalization of auger hole observations.

Abstract

Although considerable work has been conducted in recent decades to build soil databases, the legacy data from a lot of former soil survey campaigns still remain unused. The objective of this study was to determine the interest in harvesting such legacy data for mapping the soil available water capacities (SAWCs) at different rooting depths (30 cm, 60 cm, 100 cm) and to the maximal observation depth, over the commune of Bouillargues (16 km2, Occitanie region, southern France).

An increasing number of available auger hole observations with SAWC estimations – from 0 to 2781 observations – were added to the existing soil profiles to calibrate quantile regression forests (QRFs) using the Euclidean buffer distances from the sites as soil covariates. The SAWC was first mapped separately for different soil layers, and the mapping outputs were pooled to estimate the required SAWC. The uncertainty of the SAWC prediction was estimated from the estimated mapping uncertainties of the individual soil layers by an error propagation model using a first-order Taylor analysis.

The performances of the SAWC predictions and their uncertainties were evaluated with a 10-fold cross validation that was iterated 20 times. The results showed that the use of a quantile regression forest that was fed with auger hole observations and that used the Euclidean buffer distances as soil covariates considerably augmented the performances of the SAWC predictions (percentages of explained variance from 0.39 to 0.70) compared to the performance of a classical DSM approach, i.e., a QRF that solely used soil profiles and only environmental covariates (percentages of explained variance from 0.04 to 0.51). The analysis of the results revealed that the performances were also dependent on the spatial patterns of the different examined SAWCs and was limited by the observational uncertainties of the SAWCs determined from auger holes. The best performance tended to also provide the best view of the uncertainty patterns with an overestimation of uncertainty.

Despite these gains in performance, the cost-efficiency analysis showed that the augmentation of soil observations was not cost efficient because of the highly time-consuming manual data harvesting protocol. However, this result did not account for the observed gain in map details. Furthermore, the cost efficiency could be further improved by automation.

Introduction

Digital soil mapping (DSM) has been recognized as the appropriate solution to provide spatial soil information for land users, scientist communities and policy and decision makers in agriculture and the environment (McBratney et al., 2003; Sanchez et al., 2009). The principle of DSM is to predict a soil property or soil classes and the associated prediction uncertainty by determining the quantitative relationships between the soil information available over a limited set of locations and the spatial data reflecting the state factors of soil formation (environmental covariates). DSM has now moved from a largely academic movement toward an operational activity (Minasny and McBratney, 2016, Arrouays et al., 2017a,b).

However, the performances of DSM predictions of soil properties often exhibit more uncertainty than initially expected. For example, the percentages of explained variances of less than 0.5 were observed for 95%, 76%, 100% and 86% of the tested soil properties for DSM applications at the catchment scale (Nussbaum et al., 2018), at the regional scale (Vaysse and Lagacherie, 2015), at the national scale (Mulder et al., 2016), and at the global scale (Hengl et al., 2014), respectively.

These authors converged toward the conclusion that the density of soil observations used for calibrating the DSM models was the main factor that limited the DSM performances. Most of the soil information used as input in DSM applications has been either soil maps or the spatial sampling of sites with soil property measurements. The average densities used in most operational DSM applications have been low, e.g., 4–12 sites/km2 (several study areas in Nussbaum et al., 2018), 0.07 sites/km2 (Vaysse and Lagacherie, 2015), 0.03 sites/km2 (Mulder et al., 2016), and 0.001 sites/km2 (Hengl et al., 2014), which limits the performances of soil prediction, especially when the pattern of variation in the soil property is largely below the spacing of soil profiles (Vaysse and Lagacherie, 2015; Gomez and Coulouma, 2018). In addition, further experiments that consisted of varying the spatial density of soil input confirmed this analysis (Somarathna et al., 2017; Wadoux et al., 2019; Lagacherie et al., 2020). Consequently, it is of paramount importance to increase the density of soil inputs to improve the performance of DSM models in predicting soil properties (Voltz et al., 2020).

The most straightforward way to increase the density of DSM model soil inputs involves harvesting the legacy soil data that have not yet been stored in the existing soil databases. Arrouays et al. (2017a),b) showed that during the period 2009–2015, the numbers of legacy soil profiles stored in global and national soil databases increased by 1046% and 45%, respectively. However, they estimated that a large amount of soil legacy data can still be harvested. This is even more true in some areas across the world where soil surveying has been particularly active in the past.

For example, in southern France, the BRL irrigation company conducted detailed soil surveys over its irrigation perimeter between 1957 and 1992, which resulted in detailed soil maps, 25,000 soil profiles (5/km2) and 203,000 auger hole observations (31/km2). At this stage, such soil data have not yet been harvested and therefore cannot be used as input for DSM applications. However, this data has great potential for improving DSM performance and should be thoroughly examined.

In this paper, a spatially dense set of soil observations harvested from soil survey documents was tested for improving the performances of DSM models in mapping soil available water capacities for different rooting depths (0–30 cm, 0–60 cm, 0–100 cm) and at maximum observation depth, and the associated uncertainties. Our aim was to evaluate the cost-efficiency ratio of using such soil observations and to evaluate the added value of using euclidian buffer distances as additional inputs of DSM models as proposed by Hengl et al. (2018). The study is conducted in the commune of Bouillargues, which is one of the communes included in the BRL irrigation perimeter.

Section snippets

The study area

This study took place in the administrative commune of Bouillargues in the Occitanie administrative French region (Fig. 1). Located in southern France, Bouillargues covers 16 km2 and is mainly devoted to vineyards, agricultural lands, forests, and scrublands. Bouillargues has a Mediterranean climate characterized by a moderate average annual rainfall (600 mm) and dry and hot summers.

The study area is topographically split into two subregions with the large flat valley of the Vistrenque in the

DSM models for soil profiles

In this study, we used several mapping models derived from the random forest algorithm. Hereafter, we provide a general description of random forest and its derivatives used in this study.

Preliminary results

In Fig. 6, we present the distributions of SAWC30, SAWC60, SAW100 and SAWCmax for the soil profiles (left panel of Fig. 6) and auger holes (right panel of Fig. 6). We first observed that the distributions of SAWC regardless of the considered soil depth were bimodal for both the soil profiles and auger holes, with i) a higher peak for higher values of SAWC30 and SAWC60 and with ii) a higher peak for lower values of SAWC100 and SAWCmax. Additionally, it is worth noting that both the SAWC ranges

Soil available water capacity

The selected case study considered the soil available water capacity, which is among the most highly demanded properties of end users, as the targeted soil property (Richer-de-forges et al., 2019). This paper completes the small set of papers that were devoted to the digital mapping of SAWC (Hong et al., 2013; Malone et al., 2009; Padarian et al., 2014; (Poggio et al., 2010); Román Dobarco et al., 2019; Ugbaje and Reuter, 2013, Amirian-Chakan et al., 2019) and the even smaller set of papers

Conclusion

In this study, the main lessons were as follows:

  • A QRF approach using euclidian buffer distances outperformed a classical QRF approach in predicting SAWC with a dense set of profiles and auger holes

  • The addition of a dense spatial sampling of auger hole observations dramatically increased the performance in predicting SAWCs and increased the spatial resolutions of the SAWC pattern delineations, but there were limitations due to the uncertainty of the auger hole observations.

  • The performances in

Declaration of Competing Interest

None.

Acknowledgments

This research was granted by the French National Research Institute for Agriculture, food and environment (INRAE), the French Research and Technology Agency (ANRT) and BRL Exploitation. Philippe Lagacherie is a collaborator of the GLADSOILMAP Consortium supported by the LE STUDIUM Loire Valley Institute for Advanced Research Studies.

References (54)

  • D.L. Shrestha et al.

    Machine learning approaches for estimation of prediction interval for the model output

    Neural Netw

    (2006)
  • K. Vaysse et al.

    Evaluating digital soil mapping approaches for mapping GlobalSoilMap soil properties from legacy data in Languedoc-Roussillon (France)

    Geoderma Reg.

    (2015)
  • Kévin Vaysse et al.

    Using quantile regression forest to estimate uncertainty of digital soil mapping products

    Geoderma

    (2017)
  • D.J.J. Walvoort et al.

    An R package for spatial coverage sampling and random sampling from compact geographical strata by k-means

    Comput. Geosci.

    (2010)
  • A. Alfons

    cvTools: Cross-Validation Tools for Regression Models

    (2012)
  • D. Arrouays et al.

    Digital soil mapping across the globe

    Geoderma Reg.

    (2017)
  • D. Arrouays et al.

    Soil legacy data rescue via GlobalSoilMap and other international and national initiatives

    GeoResJ.

    (2017)
  • D. Arrouays et al.

    The GlobalSoilMap project specifications

    GlobalSoilMap: Basis of the Global Spatial Soil Information System - Proceedings of the 1st GlobalSoilMap Conference

    (2014)
  • D. Baize et al.

    Guide des sols

    (1995)
  • J. Böhner et al.

    SAGA — analysis and modelling applications

  • J. Bourrier

    La mesure des caractéristiques hydrodynamiques des sols par la méthode Vergière.

    Bull.Tech. du Génie Rural

    (1965)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • D.J. Brus et al.

    Sampling for validation of digital soil maps

    Eur. J. Soil Sci.

    (2011)
  • E. Dominati et al.

    A soil change-based methodology for the quantification and valuation of ecosystem services from agro-ecosystems: a case study of pastoral agriculture in New Zealand

    Ecol. Econ.

    (2014)
  • C. Gomez et al.

    Importance of the spatial extent for using soil properties estimated by laboratory VNIR/SWIR spectroscopy: examples of the clay and calcium carbonate content

    Geoderma

    (2018)
  • T. Hengl

    GSIF: Global Soil Information Facilities

  • T. Hengl et al.

    SoilGrids1km - Global soil information based on automated mapping

    PLoS ONE

    (2014)
  • Cited by (5)

    • Complex hydrological knowledge to support digital soil mapping

      2022, Geoderma
      Citation Excerpt :

      We used random forest (RF) and Cubist machine learning algorithms to predict soil attributes in the study area. Both techniques are common in DSM works, producing good results on the prediction of soil texture (da Chagas et al., 2016; Fongaro et al., 2018; Lagacherie et al., 2019; Poppiel et al., 2019b), SOC (John et al., 2021; Li et al., 2021; Moura-Bueno et al., 2021), soil classes (Flynn et al., 2019; Lamichhane et al., 2021; Odgers et al., 2014; Vincent et al., 2018; Zeraatpisheh et al., 2019), and other environmental factors related to soil (Hengl et al., 2018; Styc et al., 2021; Zhang et al., 2021). We applied the k-fold cross-validation (CV) method to the soil data to construct the prediction models using the R package “caret” (Kuhn, 2008).

    • Combining laboratory measurements and proximal soil sensing data in digital soil mapping approaches

      2021, Catena
      Citation Excerpt :

      The comparisons of the EMI integration approaches showed strong differences in performances across the three approaches with only a significative improvement of ECe prediction performances when EMI measurements are integrated through the regression co-kriging approach. Although a larger number of sites were used for calibrating the RF algorithms in approach 1, we did not observe an improvement of the results as observed by Somarathna et al. (2017), Wadoux et al. (2019a), Lagacherie et al. (2020) and Styc et al. (2021). Conversely, the introduction of pseudo values of ECe derived from EMI measurements decreased the performances, which revealed the sensitivity of RF calibration to the uncertainty of data inputs.

    • Soil parent material prediction through satellite multispectral analysis on a regional scale at the Western Paulista Plateau, Brazil

      2021, Geoderma Regional
      Citation Excerpt :

      Despite the role of soil and geological information for economical and sustainable activities (Maltman, 2012; Prokopovich, 1984), most available databases are limited and with coarse spatial resolution, hampering its application for local and regional planning (Adhikari and Hartemink, 2016; Dobos et al., 2013; McBratney et al., 2014; Nolasco de Carvalho et al., 2015). Digital soil mapping (DSM) and remote sensing (RS), emerged as important tools to fill this gap and produce more detailed maps of natural resources (Dharumarajan et al., 2021; Searle et al., 2021; Styc et al., 2021; Wadoux et al., 2019) and to improve legacy databases (Lamichhane et al., 2021; Nauman and Thompson, 2014; Odgers et al., 2014; Pelegrino et al., 2016; Vincent et al., 2018). The use of RS coupled with geographical information systems (SIG) has resulted in new representative environmental data, which assisted on the prediction of soil attributes, landscape processes, and geological information (Grimm et al., 2008; Minasny and Hartemink, 2011; Poppiel et al., 2019b).

    View full text