Skip to main content
Log in

Variable selection of spectroscopic data through monitoring both location and dispersion of PLS loading weights

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

High dimensional data sets against the small sample size is essential for most of the sciences. The variable selection contributes to a better prediction of real-life phenomena. A multivariate approach called partial least squares (PLS) has the potential to model the high dimensional data, where the sample size is usually smaller than the number of variables. Truncation for variables selection in PLS \(T-PLS\) is considered a reference method. \(T-PLS\) and many others only monitors the location of PLS loading weights for variable selection. In the current article, we propose to monitor both location and dispersion of PLS loading weights for variable selection over the high dimensional spectral data. The proposed PLS variants are based on location, dispersion, both location and dispersion and at least location or dispersion monitoring of \(PLS\) loading weights, and are denoted by \(X-PLS\), \(S-PLS\), \(X \& S-PLS\) and \(X|S-PLS\) respectively. Proposed PLS variants are compared with standard PLS and \(T-PLS\) through the Monte Carlo simulation of 100 runs on simulated and real data sets which includes corn, milk, and oil contents prediction based on spectroscopic data. \(X \& S-PLS\) shows the best capability in selecting the real variables over the simulated data. The validated RMSE comparison indicates \(X|S-PLS\) and \(X \& S-PLS\) outperforms compared to other methods in predicting corn, milk, and oil contents. \(X \& S-PLS\) selects the smallest number of variables. Interestingly, selected variables by \(X \& S-PLS\) are more consistent compared to all other methods. Hence \(X \& S-PLS\) appears a potential candidate for variable selection in high dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Afseth, N. K., Segtnan, V. H., & Wold, J. P. (2006). Raman spectra of biological samples: A study of preprocessing methods. Applied Spectroscopy, 60, 1358–1367.

    Article  Google Scholar 

  • Bersimis, S., Psarakis, S., & Panaretos, J. (2007). Multivariate statistical process control charts: An overview. Quality and Reliability Engineering International, 23, 517–543.

    Article  Google Scholar 

  • Chen, G., Cheng, S. W., & Xie, H. (2005). A new multivariate control chart for monitoring both location and dispersion. Communications in StatisticsSimulation and Computation R, 34, 203–217.

    Article  MathSciNet  Google Scholar 

  • Eilers, P. H. (2004). Parametric time warping. Analytical Chemistry, 76, 404–411.

    Article  Google Scholar 

  • Eriksson, L., Byrne, T., Johansson, E., Trygg, J., & Vikström, C. (2013). Multi-and megavariate data analysis basic principles and applications. Umetrics Academy.

  • Frank, I. (1987). Intermediate least squares regression method. Chemometrics and Intelligent Laboratory Systems, 1, 233–242.

    Article  Google Scholar 

  • Frenich, A., Jouan-Rimbaud, D., Massart, D., Kuttatharmmakul, S., Galera, M., & Vidal, J. (1995). Wavelength selection method for multicomponent spectrophotometric determinations using partial least squares. Analyst, 120, 2787–2792.

    Article  Google Scholar 

  • Keleş, S., & Chun, H. (2008). Comments on: Augmenting the bootstrap to analyze high dimensional genomic data. TEST, 17, 36–39.

    Article  MathSciNet  Google Scholar 

  • Kourti, T., & MacGregor, J. F. (1996). Multivariate spc methods for process and product monitoring. Journal of Quality Technology, 28, 409–428.

    Article  Google Scholar 

  • Liland, K. H., Almøy, T., & Mevik, B.-H. (2010). Optimal choice of baseline correction for multivariate calibration of spectra. Applied Spectroscopy, 64, 1007–1016.

    Article  Google Scholar 

  • Liland, K. H., Høy, M., Martens, H., & Sæbø, S. (2013). Distribution based truncation for variable selection in subspace methods for multivariate regression. Chemometrics and Intelligent Laboratory Systems, 122, 103–111.

    Article  Google Scholar 

  • Liland, K. H., Mevik, B.-H., Rukke, E.-O., Almøy, T., Skaugen, M., & Isaksson, T. (2009). Quantitative whole spectrum analysis with maldi-tof ms, part I: Measurement optimisation. Chemometrics and Intelligent Laboratory Systems, 96, 210–218.

    Article  Google Scholar 

  • Liland, K. H., Rukke, E.-O., Olsen, E. F., & Isaksson, T. (2011). Customized baseline correction. Chemometrics and Intelligent Laboratory Systems, 109, 51–56.

    Article  Google Scholar 

  • MacGregor, J. F., & Kourti, T. (1995). Statistical process control of multivariate processes. Control Engineering Practice, 3, 403–414.

    Article  Google Scholar 

  • Martens, H., & Næs, T. (1989). Multivariate calibration. New York: Wiley.

    MATH  Google Scholar 

  • Martin, E., Morris, A., & Zhang, J. (1996). Process performance monitoring using multivariate statistical process control. IEE Proceedings-Control Theory and Applications, 143, 132–144.

    Article  Google Scholar 

  • Mehmood, T. (2016). Hotelling t 2 based variable selection in partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 154, 23–28.

    Article  Google Scholar 

  • Mehmood, T., Liland, K. H., Snipen, L., & Sæbø, S. (2012). A review of variable selection methods in partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 118, 62–69.

    Article  Google Scholar 

  • Mehmood, T., Martens, H., Sæbø, S., Warringer, J., & Snipen, L. (2011). A partial least squares based algorithm for parsimonious variable selection. Algorithms for Molecular Biology, 6, 27.

    Article  Google Scholar 

  • Mehmood, T., Sæbø, S., & Liland, K. H. (2020). Comparison of variable selection methods in partial least squares regression. Journal of Chemometrics, 2020, e3226.

    Google Scholar 

  • Montgomery, D. C. (2007). Introduction to statistical quality control. New York: Wiley.

    MATH  Google Scholar 

  • Norgaard, L., Saudland, A., Wagner, J., Nielsen, J., Munck, L., & Engelsen, S. (2000). Interval partial least-squares regression (iPLS): A comparative chemometric study with an example from near-infrared spectroscopy. Applied Spectroscopy, 54, 413–419.

    Article  Google Scholar 

  • Raouf, A., Duffuaa, S., Ben-Daya, M., Costa, A., & Rahim, M. (2006). A synthetic control chart for monitoring the process mean and variance. Journal of Quality in Maintenance Engineering 1.

  • Sæbø, S., Almøy, T., Aarøe, J., & Aastveit, A. H. (2007). St-pls: A multi-dimensional nearest shrunken centroid type classifier via pls. Jornal of Chemometrics, 20, 54–62.

    Google Scholar 

  • Wold, S., Martens, H., & Wold, H. (1983). The multivariate calibration problem in chemistry solved by the PLS method. In Conference proceeding matrix pencils (pp. 286–293).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tahir Mehmood.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mehmood, T., Turk, A.M. Variable selection of spectroscopic data through monitoring both location and dispersion of PLS loading weights. J. Korean Stat. Soc. 50, 905–917 (2021). https://doi.org/10.1007/s42952-020-00098-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-020-00098-x

Keywords

Navigation