The applicability of vibrational spectroscopy and multivariate analysis for the characterization of animal feed where the reference values do not follow a normal distribution: A new chemometric challenge posed at the ‘Chimiométrie 2019’ congress

https://doi.org/10.1016/j.chemolab.2020.104026Get rights and content

Highlights

Abstract

A chemometric challenge was posed at the annual ‘Chimiométrie’ congress organized by the French Chemometrics Society in February 2019. The congress was held in Montpellier and the data relating to the challenge are available on the congress website (https://chemom2019.sciencesconf.org/). The aim of the challenge was to test the ability of congress participants for the characterization of animal feed by NIR when the reference values do not follow a normal distribution for three different ingredients. This paper summarizes the five best approaches put forward by participants.

Introduction

As in previous years [[1], [2], [3], [4], [5]], a challenge was posed at the ‘Chimiometrie’ congress held in Montpellier in February 2019, concerning the applicability of spectroscopy and multivariate analysis for the characterization of animal feed where the reference values do not follow a normal distribution. In particular, part of a dataset provided by the University of Córdoba (UCO), Spain, has been proposed as a challenge for the Chimiométrie 2019 conference participants [6].

When dealing with multivariate analysis, it is quite often the case that the distribution of reference values does not follow a standard distribution and then a classical regression model fails. For those situations, Fearn et al. proposed a Bayesian approach where the aim was to solve such situations [7]. For this, a model explaining the dependence of spectral data on reference values is combined with a prior distribution representing beliefs about the composition of the sample to be predicted. In their work, and using the same data as [6], they have proved that a Bayesian approach could be a relevant technique to predict the percentage of ingredients in a complete feed.

The aim of the challenge at the ‘Chimiometrie’ congress was to predict three different ingredients used for the production of animal feed through multivariate analysis; and then to predict the blind spectra of an independent test set.

This article presents the five best approaches among the seven solutions received.

Section snippets

Dataset and challenge

The spectra are from feed (farm animal feed) samples with known composition and measured in reflection mode. No chemical composition is given, but the percentages of the different ingredients are given. As the choice of ingredients is very large in feed plants, only three products were used: soya oil (y1), lucern (y2) and barley (y3) contents. The participants obtained the NIR spectra and the reference values for the calibration set and only the NIR spectra for the test set. They were not

Participant 1

First of all, a series of preprocessing techniques have been applied to the dataset that was then randomly split into training (95%) and test set (5%). The predictive ability was evaluated based on the RMSEP of the test set, after a 10-fold Cross-Validation (CV) training of PLS-regressions applied to each response. The best strategy, having the minimum sum of RMSEP across all the responses, was to apply the standard normal variate (SNV) followed by linear detrend and standardization (autoscale).

Debriefing

No one among the participants discovered the X shift. In any case, it was possible to see the shift in several ways. The first one was by projecting the test set on the PC scores calculated for the calibration set. The shift was not visible on the first PC’s but was obvious for the 13th to 20th PC’s. A simpler and easier way was to see the classical T2 Hotelling vs Q (X residuals) plot (Fig. 1).

A third way was to plot the spectra for the calibration set together with those of the test set with

Conclusion

Dealing with a large dataset, the 2019 challenge demonstrated the efficiency of discriminant analyses and local regressions or nonlinear regressions respectively to detect the presence of the ingredients and quantify them. As in previous editions, the aim of the paper was, not to compare different techniques, nor to indicate whether a procedure is better than another one, but just to show different alternatives for the same problem. The differences between the methods are relatively small in

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We thank the University of Córdoba (UCO) in Spain for supplying the spectra included in this paper.

We would also like to thank all the participants who spent time analyzing the data and presenting their results.

References (15)

There are more references available in the full text version of this article.

Cited by (5)

  • Mid infrared spectroscopy and milk quality traits: A data analysis competition at the “International Workshop on Spectroscopy and Chemometrics 2021”

    2021, Chemometrics and Intelligent Laboratory Systems
    Citation Excerpt :

    Following the interesting results obtained during similar events [see e.g. Refs. [34,35] and references therein], a chemometric challenge has been held during the inaugural edition of the “International Workshop on Spectroscopy and Chemometrics”, organized by the Vistamilk SFI Research Centre in April 2021.

  • Fast Locally Weighted PLS Modeling for Large-Scale Industrial Processes

    2020, Industrial and Engineering Chemistry Research
View full text