Abstract
In the light of recent food fraud cases, the issue of food authenticity is receiving increasing attention. New analytical methods and evaluation approaches are currently being proposed. In this framework, the evaluation of mass spectral profiles constitutes a promising avenue, e.g. for the determination of food origin. Relevant evaluation approaches include principal component analysis, artificial neural networks, random forests, support vector machines, etc. The aim is to derive decision rules for the assignation of unknown samples to different classes. These decision rules are derived on the basis of samples whose origin is known—the training set. Typically, a reliable evaluation requires that the number of samples should be considerably larger than the number of features. However, in the framework of multiparameter mass spectrometry methods, the required ratio between sample and parameter numbers is inverted, with, e.g. 100 samples versus 10 000 features. In this paper, two approaches for the establishment of reliable decision rules in spite of low sample numbers are discussed.
Similar content being viewed by others
Notes
In mathematical terms, this “shift” consists in performing a basis transformation, with the eigenvectors of the data’s covariance matrix as the new basis.
PCA is sensitive to different types of data standardization [9].
The value 8.52 is the t quantile corresponding to an alpha value of 0.5 % (corresponding to a 1 % two-sided alpha value) corrected to take multiple testing into account (the 0.5 % value is divided by 8971, i.e. by the number of peaks) and to 13 degrees of freedom (corresponding to the 14 “Red snapper” samples).
Typical activation functions are: identity function, the rectified linear unit, inverse logit, etc.
Representative means that each individual belonging to said population has the same probability of being sampled.
Such an interlaboratory study is currently in the planning phase.
Since the analysis is multi-dimensional, random variability is characterized by a covariance matrix rather than a standard deviation.
The prediction ellipsoid also makes it possible to compute a metric (distance function), measuring the distance between each point and the reference population—see for instance the Mahalanobis distance. This metric would allow a more refined analysis of the data and the establishment of more sophisticated decision rules.
Such an interlaboratory study is currently in the planning phase.
References
Blum KM et al (2017) Non-target screening and prioritization of potentially persistent, bioaccumulating and toxic domestic wastewater contaminants and their removal in on-site and large-scale sewage treatment plants. Sci Total Environ 575:265–275
Non-target screening—a powerful tool for selecting environmental pollutants (M-27/2013). Norwegian Environment Agency http://www.miljodirektoratet.no/Documents/publikasjoner/M27/M27.pdf. Accessed 16 Apr 2019
Commission Regulation (EU) No 10/2011 of 14 January 2011 on plastic materials and articles intended to come into contact with food Text with EEA relevance Web identifier: http://data.europa.eu/eli/reg/2011/10/oj. Accessed 16 Apr 2019
Geueke B (2018) Non-intentionally added substances. Food Packag Forum. https://doi.org/10.5281/zenodo.1265331
Girolami M, Mischak H, Krebs R (2006) Analysis of complex, multidimensional datasets. Drug Discov Today Technol 3(1):13–19. https://doi.org/10.1016/j.ddtec.2006.03.010
Decramer S et al (2006) Predicting the clinical outcome of congenital unilateral ureteropelvic junction obstruction in newborn by urinary proteome analysis. Nat Med 12:398. https://doi.org/10.1038/nm1384
Magnusson B, Örnemark U (eds) (2014) Eurachem guide: the fitness for purpose of analytical methods – a laboratory guide to method validation and related topics, 2nd edn. Available from https://www.eurachem.org (ISBN 978-91-87461-59-0)
Lasch P (2017) Entwicklung einer Peptidmarker-basierten LC-ESI-MS/MSMS-Methode zur Authentizitätsprüfung von Fischspezies. Master Thesis, University of Applied Sciences Bremerhaven
Uhlig S, Eichler S (2011) Are the results of customary methods for analyzing dioxin and dioxin-like compound congener profiles court-proof? J Chromatogr A 1218:5688–5693
Curry B, Rumelhart DE (1990) MSnet: a neural network that classifies mass spectra. Tetrahedron Comput Methodol 3(3–4):213–237. https://doi.org/10.1016/0898-5529(90)90053-B
Akees GmbH, Ansbacher Str. 11, 10787 Berlin, Germany
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Uhlig, S., Colson, B., Hettwer, K. et al. Valid machine learning algorithms for multiparameter methods. Accred Qual Assur 24, 271–279 (2019). https://doi.org/10.1007/s00769-019-01384-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00769-019-01384-w