Skip to main content
Log in

Valid machine learning algorithms for multiparameter methods

  • General Paper
  • Published:
Accreditation and Quality Assurance Aims and scope Submit manuscript

Abstract

In the light of recent food fraud cases, the issue of food authenticity is receiving increasing attention. New analytical methods and evaluation approaches are currently being proposed. In this framework, the evaluation of mass spectral profiles constitutes a promising avenue, e.g. for the determination of food origin. Relevant evaluation approaches include principal component analysis, artificial neural networks, random forests, support vector machines, etc. The aim is to derive decision rules for the assignation of unknown samples to different classes. These decision rules are derived on the basis of samples whose origin is known—the training set. Typically, a reliable evaluation requires that the number of samples should be considerably larger than the number of features. However, in the framework of multiparameter mass spectrometry methods, the required ratio between sample and parameter numbers is inverted, with, e.g. 100 samples versus 10 000 features. In this paper, two approaches for the establishment of reliable decision rules in spite of low sample numbers are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. In mathematical terms, this “shift” consists in performing a basis transformation, with the eigenvectors of the data’s covariance matrix as the new basis.

  2. PCA is sensitive to different types of data standardization [9].

  3. The value 8.52 is the t quantile corresponding to an alpha value of 0.5 % (corresponding to a 1 % two-sided alpha value) corrected to take multiple testing into account (the 0.5 % value is divided by 8971, i.e. by the number of peaks) and to 13 degrees of freedom (corresponding to the 14 “Red snapper” samples).

  4. Typical activation functions are: identity function, the rectified linear unit, inverse logit, etc.

  5. Representative means that each individual belonging to said population has the same probability of being sampled.

  6. Such an interlaboratory study is currently in the planning phase.

  7. Since the analysis is multi-dimensional, random variability is characterized by a covariance matrix rather than a standard deviation.

  8. The prediction ellipsoid also makes it possible to compute a metric (distance function), measuring the distance between each point and the reference population—see for instance the Mahalanobis distance. This metric would allow a more refined analysis of the data and the establishment of more sophisticated decision rules.

  9. Such an interlaboratory study is currently in the planning phase.

References

  1. Blum KM et al (2017) Non-target screening and prioritization of potentially persistent, bioaccumulating and toxic domestic wastewater contaminants and their removal in on-site and large-scale sewage treatment plants. Sci Total Environ 575:265–275

    Article  CAS  PubMed  Google Scholar 

  2. Non-target screening—a powerful tool for selecting environmental pollutants (M-27/2013). Norwegian Environment Agency http://www.miljodirektoratet.no/Documents/publikasjoner/M27/M27.pdf. Accessed 16 Apr 2019

  3. Commission Regulation (EU) No 10/2011 of 14 January 2011 on plastic materials and articles intended to come into contact with food Text with EEA relevance Web identifier: http://data.europa.eu/eli/reg/2011/10/oj. Accessed 16 Apr 2019

  4. Geueke B (2018) Non-intentionally added substances. Food Packag Forum. https://doi.org/10.5281/zenodo.1265331

    Article  Google Scholar 

  5. Girolami M, Mischak H, Krebs R (2006) Analysis of complex, multidimensional datasets. Drug Discov Today Technol 3(1):13–19. https://doi.org/10.1016/j.ddtec.2006.03.010

    Article  PubMed  Google Scholar 

  6. Decramer S et al (2006) Predicting the clinical outcome of congenital unilateral ureteropelvic junction obstruction in newborn by urinary proteome analysis. Nat Med 12:398. https://doi.org/10.1038/nm1384

    Article  CAS  PubMed  Google Scholar 

  7. Magnusson B, Örnemark U (eds) (2014) Eurachem guide: the fitness for purpose of analytical methods – a laboratory guide to method validation and related topics, 2nd edn. Available from https://www.eurachem.org (ISBN 978-91-87461-59-0)

  8. Lasch P (2017) Entwicklung einer Peptidmarker-basierten LC-ESI-MS/MSMS-Methode zur Authentizitätsprüfung von Fischspezies. Master Thesis, University of Applied Sciences Bremerhaven

  9. Uhlig S, Eichler S (2011) Are the results of customary methods for analyzing dioxin and dioxin-like compound congener profiles court-proof? J Chromatogr A 1218:5688–5693

    Article  CAS  PubMed  Google Scholar 

  10. Curry B, Rumelhart DE (1990) MSnet: a neural network that classifies mass spectra. Tetrahedron Comput Methodol 3(3–4):213–237. https://doi.org/10.1016/0898-5529(90)90053-B

    Article  CAS  Google Scholar 

  11. Akees GmbH, Ansbacher Str. 11, 10787 Berlin, Germany

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steffen Uhlig.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Uhlig, S., Colson, B., Hettwer, K. et al. Valid machine learning algorithms for multiparameter methods. Accred Qual Assur 24, 271–279 (2019). https://doi.org/10.1007/s00769-019-01384-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00769-019-01384-w

Keywords

Navigation