Evaluation of pre-processing and variable selection on energy dispersive X-ray fluorescence spectral data with partial least square regression: A case of study for soil organic carbon prediction

https://doi.org/10.1016/j.sab.2020.106016Get rights and content

Highlights

  • Poisson scaling + mean center was suitable for pre-processing EDXRF spectral data.

  • Variable selection highlighted the spectra interval which influenced the models.

  • SOC was determined with accuracy and precision using EDXRF spectral data.

Abstract

Most studies which have reported soil fertility attributes employing Energy Dispersive X-ray Fluorescence (EDXRF) combined with multivariate calibration make use of elemental concentration data. This combination may cause relevant information loss contained in EDXRF spectra. However, a well-established soil EDXRF spectra data treatment procedure for multivariate calibration is not currently available. The objective of this study was to evaluate the influence of different pre-processing and variable selection methods in partial least square regression models using EDXRF spectral data. Measurements were obtained under two experimental conditions (15 kV and 50 kV at tube) for soil organic carbon determination. Poisson scaling + mean center proved to be the most suitable pre-processing for this data set. The variable selection by successive projection algorithm for interval selection in partial least squares improved the performance of all tested pre-processing (or at least kept constant in terms of the errors). The 15 kV condition models with Pareto scaling and Poisson scaling + mean center were the most accurate and precise. The ratio to performance of deviation values for these models was of 2.2. The figures of merit demonstrated the soil organic carbon determination feasibility using EDXRF spectral data with these pre-processing since the accuracy, precision and limits of detection were consistent with previous reports. Thus, this study contributes toward the establishment of an approach for soil EDXRF spectral data treatment for multivariate calibration. It also contributes to a better EDXRF variables interpretation which impacts soil organic carbon modeling, demonstrating the proposed methodology potential.

Introduction

EDXRF equipment emits X-ray beams that may interact with matter by photoelectric effect, elastic and inelastic scattering [1]. The fluorescent phenomenon is related to the photoelectric effect, in which X-rays with energies higher than the absorption edges of an element may remove electrons from the inner shells of the atom. It results in the atom excitation and the production of characteristic X-rays that have specific energies for each element. Their intensities are related to their mass fraction [2]. Moreover, the intensity of each element depends on the fluorescent yield. It sets the relative frequency of a characteristic X-ray and not an Auger electron which is emitted after the atom excitation. The Auger process predominates for elements with atomic numbers Z < 20 [3].

Rayleigh, Compton, and Raman scatterings may occur simultaneously with the photoelectric effect [4,5]. The scattering effects occur for all atoms and depend on several issues, such as sample elemental composition, sample density, experiment geometry, and scatter factors [6]. Within the energy range in which most EDXRF equipment work, the scattering cross-section relative to photoelectric cross-section is greater for low Z-elements (e.g. H, C, N, O) [1,7]. Due to the low photoelectric cross-section, the carbon is not directly determined by EDXRF. However, the scattering regions of the EDXRF spectra carry information of this element which may be obtained indirectly with the use of multivariate analysis tools [[8], [9], [10], [11]].

Recent studies demonstrated the potential of the energy dispersive X-ray fluorescence (EDXRF) combined with multivariate analysis to assess soil fertility attributes [7,[12], [13], [14], [15], [16]]. Including soil organic carbon (SOC) which plays a crucial role in soil fertility and environmental protection [10,17,18]. Nonetheless, most of these studies have employed elemental concentration data.

Commonly, quantitative software developed by the equipment manufacturer is applied for concentration determination. Each piece of equipment has its own specific quantitative routines for spectra acquisition and processing. Although these quantitative packages have a user-friendly approach, they are expensive and not fully transparent [19]. Due to the matrix effect complexity of the soil samples, the concentration result reported by these quantitative routines might not be accurate. It may lead to misinterpretations or poor prediction models. Additionally, using only the elemental concentration data does not take into account scattering effects, which contribution to complex organic samples characterization has been demonstrated [6,8,10,[20], [21], [22]]. In this sense, the use of the EDXRF spectral data considers a broader data set with more relevant information than a data set with concentration figures. However, an established procedure or widely adopted approach for the pre-processing of soil EDXRF spectra data for multivariate calibration is not available. In addition, not all variables in the spectrum contribute to the model. In this context, variable selection methods employment may improve the models prediction enabling a simpler interpretation of the variables contribution [7,23,24].

A successful determination of spectrally active soil components from any spectroscopic technique depends on the selection of an adequate multivariate calibration technique which combined with a thorough evaluation of the pre-processing steps and variable selection methods could improve the models predictive ability. There are many pre-processing possibilities which may be applied individually or in sequence to spectral data, as well as several variable selection models. However, there is a lack of studies with EDXRF data comparing the calibration models performance and, at same time, seeking a chemical and physical interpretation that justifies a specific combination of these processes.

Therefore, the purpose of this study was to evaluate the use of different pre-processing and variable selection techniques on the EDXRF spectra for SOC determination using partial least square regression (PLS), aiming to improve the models predictive ability and to interpret the spectral ranges that most contribute to the SOC determination.

Section snippets

Study area and soil sampling

The study area (23°09′59.70”S, 51°14′42.27”W to 23°09′50.14”S, 51°14′42.27”W) is located in Cambé municipality, Paraná State, Brazil (north region of Paraná state). The soils were classified as Red Latosol (Rhodic Ferralsol) and Red Nitosol (Rhodic Nitisol) according to the Brazilian classification system and FAO classification [25,26]. Both soils are highly weathered from basalt, with very clay texture, predominance of kaolinite, iron and aluminum oxyhydroxides. Cultures used are principally

SOC conventional analysis and EDXRF spectra

The descriptive statistics of SOC measured by the dichromate oxidation method for all samples, calibration set, and prediction set are presented in the Fig. 1. The SOC values of the calibration set ranged from 7.6 to 30.5 g kg−1 with a mean value of 20.0 g kg−1 and a coefficient of variation (CV) value of 0.22. The prediction set ranged from 9.5 to 29.2 g kg−1 with an average of 19.4 g kg−1 and 0.21 CV value. The CV values for these datasets indicated that the SOC values were moderately

Conclusions

The results of this study demonstrated that, for both measurement conditions (15 kV and 50 kV), the Pareto scaling and Poisson + mean center were the pre-processing that presented minimum RMSEP. Although these pre-processing showed equivalent accuracy, Poisson scaling was more consistent for this data set due to the Poisson nature in the XRF counting data process.

The iSPA-PLS variable selection method improved the models' predictive ability with mean centering and Pareto scaling. However, the

Author Contributions

Conceptualization: F.R.S. and F.L.M.; methodology and Software: F.R.S., F.L.M. and E.B.; validation: J.F.O. and F.L.M.; formal analysis: F.R.S., J.F.O. and F.L.M.; investigation: F.R.S.; resources: F.R.S., F.L.M. and G.M.C.B.; data curation: F.R.S.; writing—original draft preparation: F.R.S. and F.L.M.; writing—review and editing: F.R.S., J.F.O., E.B., G.M.C.B. and F.L.M.; visualization: F.R.S. and F.L.M.; supervision: G.M.C.B. and F.L.M.; project administration: F.R.S. and F.L.M.; funding

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors acknowledge the support of CNPq (by the PhD scholarship - grant number 142985/2016-3 and CNPq 304722/2017-0), INCT (464898/2014-5) and IBITIBA research project (32.02.120.00.00). The authors also acknowledge to Dr. Adriano de Araújo Gomes for iSPA-PLS routine.

References (63)

  • G.A. Lisboa Nogueira et al.

    Determination of metal content in industrial powder ink and paint thickness over steel plates using X-ray fluorescence

    Appl. Radiat. Isot.

    (2019)
  • M.R. Keenan et al.

    Optimal scaling of TOF-SIMS spectrum-images prior to multivariate statistical analysis

    Appl. Surf. Sci.

    (2004)
  • J. Engel et al.

    Breaking with trends in pre-processing?

    TrAC Trends Anal. Chem.

    (2013)
  • Å. Rinnan et al.

    Review of the most common pre-processing techniques for near-infrared spectra

    TrAC Trends Anal. Chem.

    (2009)
  • P. Geladi et al.

    Partial least-squares regression: a tutorial

    Anal. Chim. Acta

    (1986)
  • R.A. Viscarra Rossel et al.

    Determining the composition of mineral-organic mixes using UV-vis-NIR diffuse reflectance spectroscopy

    Geoderma.

    (2006)
  • M.H. Ferreira et al.

    Development and validation of a chemometric method for direct determination of hydrochlorothiazide in pharmaceutical samples by diffuse reflectance near infrared spectroscopy

    Microchem. J.

    (2013)
  • A.C. Olivieri

    Practical guidelines for reporting results in single- and multi-component analytical calibration: a tutorial

    Anal. Chim. Acta

    (2015)
  • V. Centner et al.

    Detection of nonlinearity in multivariate calibration

    Anal. Chim. Acta

    (1998)
  • L.A. Currie

    Nomenclature in evaluation of analytical methods including detection and quantification capabilities (IUPAC recommendations 1995)

    Anal. Chim. Acta

    (1999)
  • H. Van der Voet

    Comparing the predictive accuracy of models using a simple randomization test

    Chemom. Intell. Lab. Syst.

    (1994)
  • Z. Xiaobo et al.

    Use of FT-NIR spectrometry in non-invasive measurements of soluble solid contents (SSC) of ‘Fuji’ apple based on different PLS models

    Chemom. Intell. Lab. Syst.

    (2007)
  • M.C.U. Araújo et al.

    The successive projections algorithm for variable selection in spectroscopic multicomponent analysis

    Chemom. Intell. Lab. Syst.

    (2001)
  • A. de Araújo Gomes et al.

    The successive projections algorithm for interval selection in PLS

    Microchem. J.

    (2013)
  • A.W. Warrick et al.

    Spatial variability of soil physical properties in the field

  • M. Kleber et al.

    Chapter One - Mineral–Organic Associations: Formation, Properties, and Relevance in Soil Environments

    (2015)
  • C. Gomez et al.

    Soil organic carbon prediction by hyperspectral remote sensing and field Vis-NIR spectroscopy: an Australian case study

    Geoderma.

    (2008)
  • M. Vohland et al.

    Comparing different multivariate calibration methods for the determination of soil organic carbon pools with visible to near infrared spectroscopy

    Geoderma.

    (2011)
  • N.J. Sithole et al.

    Robust vis-NIRS models for rapid assessment of soil organic carbon and nitrogen in Feralsols haplic soils from different tillage management practices

    Comput. Electron. Agric.

    (2018)
  • R.S. Bricklemyer et al.

    On-the-go VisNIR: potential and limitations for mapping soil clay and organic carbon

    Comput. Electron. Agric.

    (2010)
  • B. Stenberg et al.

    Visible and near infrared spectroscopy in soil science

  • Cited by (22)

    View all citing articles on Scopus
    View full text