Evaluation of pre-processing and variable selection on energy dispersive X-ray fluorescence spectral data with partial least square regression: A case of study for soil organic carbon prediction
Graphical abstract
Introduction
EDXRF equipment emits X-ray beams that may interact with matter by photoelectric effect, elastic and inelastic scattering [1]. The fluorescent phenomenon is related to the photoelectric effect, in which X-rays with energies higher than the absorption edges of an element may remove electrons from the inner shells of the atom. It results in the atom excitation and the production of characteristic X-rays that have specific energies for each element. Their intensities are related to their mass fraction [2]. Moreover, the intensity of each element depends on the fluorescent yield. It sets the relative frequency of a characteristic X-ray and not an Auger electron which is emitted after the atom excitation. The Auger process predominates for elements with atomic numbers Z < 20 [3].
Rayleigh, Compton, and Raman scatterings may occur simultaneously with the photoelectric effect [4,5]. The scattering effects occur for all atoms and depend on several issues, such as sample elemental composition, sample density, experiment geometry, and scatter factors [6]. Within the energy range in which most EDXRF equipment work, the scattering cross-section relative to photoelectric cross-section is greater for low Z-elements (e.g. H, C, N, O) [1,7]. Due to the low photoelectric cross-section, the carbon is not directly determined by EDXRF. However, the scattering regions of the EDXRF spectra carry information of this element which may be obtained indirectly with the use of multivariate analysis tools [[8], [9], [10], [11]].
Recent studies demonstrated the potential of the energy dispersive X-ray fluorescence (EDXRF) combined with multivariate analysis to assess soil fertility attributes [7,[12], [13], [14], [15], [16]]. Including soil organic carbon (SOC) which plays a crucial role in soil fertility and environmental protection [10,17,18]. Nonetheless, most of these studies have employed elemental concentration data.
Commonly, quantitative software developed by the equipment manufacturer is applied for concentration determination. Each piece of equipment has its own specific quantitative routines for spectra acquisition and processing. Although these quantitative packages have a user-friendly approach, they are expensive and not fully transparent [19]. Due to the matrix effect complexity of the soil samples, the concentration result reported by these quantitative routines might not be accurate. It may lead to misinterpretations or poor prediction models. Additionally, using only the elemental concentration data does not take into account scattering effects, which contribution to complex organic samples characterization has been demonstrated [6,8,10,[20], [21], [22]]. In this sense, the use of the EDXRF spectral data considers a broader data set with more relevant information than a data set with concentration figures. However, an established procedure or widely adopted approach for the pre-processing of soil EDXRF spectra data for multivariate calibration is not available. In addition, not all variables in the spectrum contribute to the model. In this context, variable selection methods employment may improve the models prediction enabling a simpler interpretation of the variables contribution [7,23,24].
A successful determination of spectrally active soil components from any spectroscopic technique depends on the selection of an adequate multivariate calibration technique which combined with a thorough evaluation of the pre-processing steps and variable selection methods could improve the models predictive ability. There are many pre-processing possibilities which may be applied individually or in sequence to spectral data, as well as several variable selection models. However, there is a lack of studies with EDXRF data comparing the calibration models performance and, at same time, seeking a chemical and physical interpretation that justifies a specific combination of these processes.
Therefore, the purpose of this study was to evaluate the use of different pre-processing and variable selection techniques on the EDXRF spectra for SOC determination using partial least square regression (PLS), aiming to improve the models predictive ability and to interpret the spectral ranges that most contribute to the SOC determination.
Section snippets
Study area and soil sampling
The study area (23°09′59.70”S, 51°14′42.27”W to 23°09′50.14”S, 51°14′42.27”W) is located in Cambé municipality, Paraná State, Brazil (north region of Paraná state). The soils were classified as Red Latosol (Rhodic Ferralsol) and Red Nitosol (Rhodic Nitisol) according to the Brazilian classification system and FAO classification [25,26]. Both soils are highly weathered from basalt, with very clay texture, predominance of kaolinite, iron and aluminum oxyhydroxides. Cultures used are principally
SOC conventional analysis and EDXRF spectra
The descriptive statistics of SOC measured by the dichromate oxidation method for all samples, calibration set, and prediction set are presented in the Fig. 1. The SOC values of the calibration set ranged from 7.6 to 30.5 g kg−1 with a mean value of 20.0 g kg−1 and a coefficient of variation (CV) value of 0.22. The prediction set ranged from 9.5 to 29.2 g kg−1 with an average of 19.4 g kg−1 and 0.21 CV value. The CV values for these datasets indicated that the SOC values were moderately
Conclusions
The results of this study demonstrated that, for both measurement conditions (15 kV and 50 kV), the Pareto scaling and Poisson + mean center were the pre-processing that presented minimum RMSEP. Although these pre-processing showed equivalent accuracy, Poisson scaling was more consistent for this data set due to the Poisson nature in the XRF counting data process.
The iSPA-PLS variable selection method improved the models' predictive ability with mean centering and Pareto scaling. However, the
Author Contributions
Conceptualization: F.R.S. and F.L.M.; methodology and Software: F.R.S., F.L.M. and E.B.; validation: J.F.O. and F.L.M.; formal analysis: F.R.S., J.F.O. and F.L.M.; investigation: F.R.S.; resources: F.R.S., F.L.M. and G.M.C.B.; data curation: F.R.S.; writing—original draft preparation: F.R.S. and F.L.M.; writing—review and editing: F.R.S., J.F.O., E.B., G.M.C.B. and F.L.M.; visualization: F.R.S. and F.L.M.; supervision: G.M.C.B. and F.L.M.; project administration: F.R.S. and F.L.M.; funding
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors acknowledge the support of CNPq (by the PhD scholarship - grant number 142985/2016-3 and CNPq 304722/2017-0), INCT (464898/2014-5) and IBITIBA research project (32.02.120.00.00). The authors also acknowledge to Dr. Adriano de Araújo Gomes for iSPA-PLS routine.
References (63)
- et al.
Assessment of the physical properties, and the hydrogen, carbon, and oxygen content in plastics using energy-dispersive X-ray fluorescence spectrometry
Spectrochim. Acta Part B At. Spectrosc.
(2020) - et al.
EDXRF spectral data combined with PLSR to determine some soil fertility indicators
Microchem. J.
(2020) - et al.
Quick analysis of organic matter in soil by energy-dispersive X-ray fluorescence and multivariate analysis
Appl. Radiat. Isot.
(2017) - et al.
Determination of base saturation percentage in agricultural soils via portable X-ray fluorescence spectrometer
Geoderma.
(2019) - et al.
Soil texture prediction in tropical soils: A portable X-ray fluorescence spectrometry approach
Geoderma
(2020) - et al.
Synthesized use of VisNIR DRS and PXRF for soil characterization: Total carbon and total nitrogen
Geoderma.
(2015) - et al.
Energy dispersive X-ray fluorescence and scattering assessment of soil quality via partial least squares and artificial neural networks analytical modeling approaches
Talanta.
(2012) - et al.
X-ray scattering and multivariate analysis for classification of organic samples: a comparative study using Rh tube and synchrotron radiation
Anal. Chim. Acta
(2007) - et al.
Determination of the polymeric thin film thickness by energy dispersive X-ray fluorescence and multivariate analysis
Spectrochim. Acta Part B At. Spectrosc.
(2020) - et al.
A selective review and comparison for interval variable selection in spectroscopic modeling
Chemom. Intell. Lab. Syst.
(2018)
Determination of metal content in industrial powder ink and paint thickness over steel plates using X-ray fluorescence
Appl. Radiat. Isot.
Optimal scaling of TOF-SIMS spectrum-images prior to multivariate statistical analysis
Appl. Surf. Sci.
Breaking with trends in pre-processing?
TrAC Trends Anal. Chem.
Review of the most common pre-processing techniques for near-infrared spectra
TrAC Trends Anal. Chem.
Partial least-squares regression: a tutorial
Anal. Chim. Acta
Determining the composition of mineral-organic mixes using UV-vis-NIR diffuse reflectance spectroscopy
Geoderma.
Development and validation of a chemometric method for direct determination of hydrochlorothiazide in pharmaceutical samples by diffuse reflectance near infrared spectroscopy
Microchem. J.
Practical guidelines for reporting results in single- and multi-component analytical calibration: a tutorial
Anal. Chim. Acta
Detection of nonlinearity in multivariate calibration
Anal. Chim. Acta
Nomenclature in evaluation of analytical methods including detection and quantification capabilities (IUPAC recommendations 1995)
Anal. Chim. Acta
Comparing the predictive accuracy of models using a simple randomization test
Chemom. Intell. Lab. Syst.
Use of FT-NIR spectrometry in non-invasive measurements of soluble solid contents (SSC) of ‘Fuji’ apple based on different PLS models
Chemom. Intell. Lab. Syst.
The successive projections algorithm for variable selection in spectroscopic multicomponent analysis
Chemom. Intell. Lab. Syst.
The successive projections algorithm for interval selection in PLS
Microchem. J.
Spatial variability of soil physical properties in the field
Chapter One - Mineral–Organic Associations: Formation, Properties, and Relevance in Soil Environments
Soil organic carbon prediction by hyperspectral remote sensing and field Vis-NIR spectroscopy: an Australian case study
Geoderma.
Comparing different multivariate calibration methods for the determination of soil organic carbon pools with visible to near infrared spectroscopy
Geoderma.
Robust vis-NIRS models for rapid assessment of soil organic carbon and nitrogen in Feralsols haplic soils from different tillage management practices
Comput. Electron. Agric.
On-the-go VisNIR: potential and limitations for mapping soil clay and organic carbon
Comput. Electron. Agric.
Visible and near infrared spectroscopy in soil science
Cited by (22)
Critical evaluation of energy dispersive X-ray fluorescence spectrometry for multielemental analysis of coffee samples: Sample preparation, quantification and chemometric approaches
2024, Spectrochimica Acta - Part B Atomic SpectroscopyCombination of feature selection and geographical stratification increases the soil total nitrogen estimation accuracy based on vis-NIR and pXRF spectral fusion
2024, Computers and Electronics in AgricultureOptimization of pXRF instrumentation conditions and multivariate modeling in soil fertility attributes determination
2024, Spectrochimica Acta - Part B Atomic SpectroscopyDetection of adulteration in Eragrostis tef (Zucc.) Trotter flour using EDXRF and ComDim-MLR data fusion
2023, Analytica Chimica ActaDetermination of carbon, oxygen, hydrogen and nitrogen content in coals using WDXRF scattering spectra
2023, Spectrochimica Acta - Part B Atomic Spectroscopy