RESEARCH PAPER
Adaptive Variable Re-weighting and Shrinking Approach for Variable Selection in Multivariate Calibration for Near-infrared Spectroscopy

https://doi.org/10.1016/S1872-2040(21)60102-0Get rights and content

Abstract

With the generation of high-dimensional data from spectroscopic instruments, the role of variable selection in spectral modeling has become very important. This research proposes a new variable selection algorithm, named adaptive variable re-weighting and shrinking approach (AVRSA), based on model population analysis (MPA) and weighted bootstrap sampling (WBS). In this algorithm, WBS is used to generate sub-datasets for modeling in each iteration round, and the variable weight and space are updated by statistically evaluating the optimal sub-models. Unlike most other variable selection methods, the average prediction performance of the optimal sub-models by AVRSA must be preferable to that of the previous iteration. The best informative variables are obtained until no further optimal sub-models are generated. This method is checked on three near infrared (NIR) datasets. Three variable selection methods, including competitive adaptive reweighted sampling (CARS), MonteCarlo uninformative variable elimination (MC-UVE) and iteratively variable subset optimization (IVSO), are used for comparison. Compared with these variable selection algorithms, AVRSA selects the least informative variables, which is convenient for the development of portable instruments. Compared with the full-spectrum PLS model, the root mean square error of the validation set (RMSEP) of corn starch is decreased from 0.2614 to 0.1093, and the RMSEP of corn protein is decreased from 0.0977 to 0.0374. In addition, the RMSEP of the wheat dataset is decreased from 0.2585 to 0.2157, and the RMSEP of the wheat kernel dataset is decreased from 0.7816 to 0.6661. The results show that the proposed method is very efficient for the high-dimensional spectrum to find the best variables and improve the model's prediction performance.

Graphical abstract

A new variable selection method, named AVRSA, is proposed. It uses weighted bootstrap sampling to generate a random sub-model group, then selects sub-models, whose performance is better than the average performance of the optimal sub-model group generated in the last iteration to form a new optimal sub-model group, and finally updates the variable weights and shrinks the variable space by evaluating the optimal sub-model group. Repeat until no optimal sub-model is generated, then the optimal informative variables are obtained.

  1. Download : Download high-res image (148KB)
  2. Download : Download full-size image

References (42)

  • Z Yang et al.

    Measurement

    (2020)
  • I Fabijanić et al.

    Carbohydr. Polym.

    (2019)
  • S Sans et al.

    Food Chem.

    (2018)
  • A Gredilla et al.

    TrAC-Trends Anal. Chem.

    (2016)
  • S Wold et al.

    Chemom. Intell. Lab. Syst.

    (2001)
  • K Zheng et al.

    Chemom. Intell. Lab. Syst.

    (2012)
  • H Li et al.

    Anal. Chim. Acta

    (2009)
  • W Cai et al.

    Chemom. Intell. Lab. Syst.

    (2008)
  • Y H Yun et al.

    Anal. Chim. Acta

    (2014)
  • Y H Yun et al.

    Anal. Chim. Acta

    (2019)
  • Y H Yun et al.

    Anal. Chim. Acta

    (2015)
  • C B Lucasius et al.

    Anal. Chim. Acta

    (1994)
  • H D Li et al.

    Anal. Chim. Acta

    (2012)
  • B C Deng et al.

    Anal. Chim. Acta

    (2016)
  • H Zhao et al.

    Chin. J. Anal. Chem.

    (2018)
  • R Gosselin et al.

    Chemom. Intell. Lab. Syst.

    (2010)
  • J H Kalivas

    Chemom. Intell. Lab. Syst.

    (1997)
  • Y H Yun et al.

    Chemom. Intell. Lab. Syst.

    (2014)
  • Y H Yun et al.

    TrAC-Trends Anal. Chem.

    (2019)
  • C Pasquini

    J. Braz. Chem. Soc.

    (2003)
  • T Angelopoulou et al.

    Sustainability

    (2020)
  • Cited by (0)

    This work was supported by the National Natural Science Foundation of China (No. 31871571), the Key Technologies R&D Program of Shanxi Province, China (Nos. 201903D211002, 201603D221037-3), the Shanxi Province Applied Basic Research Project of China (No. 201801D221299), and the Science and Technology Innovation Fund of Shanxi Agricultural University, China (Nos. 201308, 2020BQ32).

    View full text