Valid machine learning algorithms for multiparameter methods

Uhlig, Steffen; Colson, Bertrand; Hettwer, Karina; Simon, Kirsten; Uhlig, Carsten; Wittke, Stefan; Stoyke, Manfred; Gowik, Petra

doi:10.1007/s00769-019-01384-w

Valid machine learning algorithms for multiparameter methods

General Paper
Published: 24 April 2019

Volume 24, pages 271–279, (2019)
Cite this article

Accreditation and Quality Assurance Aims and scope Submit manuscript

Steffen Uhlig ORCID: orcid.org/0000-0001-8700-8686¹,
Bertrand Colson¹,
Karina Hettwer¹,
Kirsten Simon¹,
Carsten Uhlig²,
Stefan Wittke³,
Manfred Stoyke⁴ &
…
Petra Gowik⁴

515 Accesses
5 Citations
12 Altmetric
1 Mention
Explore all metrics

Abstract

In the light of recent food fraud cases, the issue of food authenticity is receiving increasing attention. New analytical methods and evaluation approaches are currently being proposed. In this framework, the evaluation of mass spectral profiles constitutes a promising avenue, e.g. for the determination of food origin. Relevant evaluation approaches include principal component analysis, artificial neural networks, random forests, support vector machines, etc. The aim is to derive decision rules for the assignation of unknown samples to different classes. These decision rules are derived on the basis of samples whose origin is known—the training set. Typically, a reliable evaluation requires that the number of samples should be considerably larger than the number of features. However, in the framework of multiparameter mass spectrometry methods, the required ratio between sample and parameter numbers is inverted, with, e.g. 100 samples versus 10 000 features. In this paper, two approaches for the establishment of reliable decision rules in spite of low sample numbers are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Methodology of chemometric modeling of spectrometric signals in the analysis of complex samples

Article 01 February 2017

Novel application of heuristic optimisation enables the creation and thorough evaluation of robust support vector machine ensembles for machine learning applications

Article Open access 21 November 2015

Python workflow for the selection and identification of marker peptides—proof-of-principle study with heated milk

Article Open access 12 April 2024

Notes

In mathematical terms, this “shift” consists in performing a basis transformation, with the eigenvectors of the data’s covariance matrix as the new basis.
PCA is sensitive to different types of data standardization [9].
The value 8.52 is the t quantile corresponding to an alpha value of 0.5 % (corresponding to a 1 % two-sided alpha value) corrected to take multiple testing into account (the 0.5 % value is divided by 8971, i.e. by the number of peaks) and to 13 degrees of freedom (corresponding to the 14 “Red snapper” samples).
Typical activation functions are: identity function, the rectified linear unit, inverse logit, etc.
Representative means that each individual belonging to said population has the same probability of being sampled.
Such an interlaboratory study is currently in the planning phase.
Since the analysis is multi-dimensional, random variability is characterized by a covariance matrix rather than a standard deviation.
The prediction ellipsoid also makes it possible to compute a metric (distance function), measuring the distance between each point and the reference population—see for instance the Mahalanobis distance. This metric would allow a more refined analysis of the data and the establishment of more sophisticated decision rules.
Such an interlaboratory study is currently in the planning phase.

References

Blum KM et al (2017) Non-target screening and prioritization of potentially persistent, bioaccumulating and toxic domestic wastewater contaminants and their removal in on-site and large-scale sewage treatment plants. Sci Total Environ 575:265–275
Article CAS PubMed Google Scholar
Non-target screening—a powerful tool for selecting environmental pollutants (M-27/2013). Norwegian Environment Agency http://www.miljodirektoratet.no/Documents/publikasjoner/M27/M27.pdf. Accessed 16 Apr 2019
Commission Regulation (EU) No 10/2011 of 14 January 2011 on plastic materials and articles intended to come into contact with food Text with EEA relevance Web identifier: http://data.europa.eu/eli/reg/2011/10/oj. Accessed 16 Apr 2019
Geueke B (2018) Non-intentionally added substances. Food Packag Forum. https://doi.org/10.5281/zenodo.1265331
Article Google Scholar
Girolami M, Mischak H, Krebs R (2006) Analysis of complex, multidimensional datasets. Drug Discov Today Technol 3(1):13–19. https://doi.org/10.1016/j.ddtec.2006.03.010
Article PubMed Google Scholar
Decramer S et al (2006) Predicting the clinical outcome of congenital unilateral ureteropelvic junction obstruction in newborn by urinary proteome analysis. Nat Med 12:398. https://doi.org/10.1038/nm1384
Article CAS PubMed Google Scholar
Magnusson B, Örnemark U (eds) (2014) Eurachem guide: the fitness for purpose of analytical methods – a laboratory guide to method validation and related topics, 2nd edn. Available from https://www.eurachem.org (ISBN 978-91-87461-59-0)
Lasch P (2017) Entwicklung einer Peptidmarker-basierten LC-ESI-MS/MSMS-Methode zur Authentizitätsprüfung von Fischspezies. Master Thesis, University of Applied Sciences Bremerhaven
Uhlig S, Eichler S (2011) Are the results of customary methods for analyzing dioxin and dioxin-like compound congener profiles court-proof? J Chromatogr A 1218:5688–5693
Article CAS PubMed Google Scholar
Curry B, Rumelhart DE (1990) MSnet: a neural network that classifies mass spectra. Tetrahedron Comput Methodol 3(3–4):213–237. https://doi.org/10.1016/0898-5529(90)90053-B
Article CAS Google Scholar
Akees GmbH, Ansbacher Str. 11, 10787 Berlin, Germany

Download references

Author information

Authors and Affiliations

QuoData GmbH, Prellerstr. 14, 01309, Dresden, Germany
Steffen Uhlig, Bertrand Colson, Karina Hettwer & Kirsten Simon
Akees GmbH, Ansbacher Str. 11, 10787, Berlin, Germany
Carsten Uhlig
University of Applied Sciences Bremerhaven, An der Karlstadt 8, 27568, Bremerhaven, Germany
Stefan Wittke
Federal Office of Consumer Protection and Food Safety, Mauerstr. 3942, 10117, Berlin, Germany
Manfred Stoyke & Petra Gowik

Authors

Steffen Uhlig
View author publications
You can also search for this author in PubMed Google Scholar
Bertrand Colson
View author publications
You can also search for this author in PubMed Google Scholar
Karina Hettwer
View author publications
You can also search for this author in PubMed Google Scholar
Kirsten Simon
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Uhlig
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Wittke
View author publications
You can also search for this author in PubMed Google Scholar
Manfred Stoyke
View author publications
You can also search for this author in PubMed Google Scholar
Petra Gowik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steffen Uhlig.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Uhlig, S., Colson, B., Hettwer, K. et al. Valid machine learning algorithms for multiparameter methods. Accred Qual Assur 24, 271–279 (2019). https://doi.org/10.1007/s00769-019-01384-w

Download citation

Received: 28 August 2018
Accepted: 08 April 2019
Published: 24 April 2019
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s00769-019-01384-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Valid machine learning algorithms for multiparameter methods

Abstract

Access this article

Similar content being viewed by others

Methodology of chemometric modeling of spectrometric signals in the analysis of complex samples

Novel application of heuristic optimisation enables the creation and thorough evaluation of robust support vector machine ensembles for machine learning applications

Python workflow for the selection and identification of marker peptides—proof-of-principle study with heated milk

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Valid machine learning algorithms for multiparameter methods

Abstract

Access this article

Similar content being viewed by others

Methodology of chemometric modeling of spectrometric signals in the analysis of complex samples

Novel application of heuristic optimisation enables the creation and thorough evaluation of robust support vector machine ensembles for machine learning applications

Python workflow for the selection and identification of marker peptides—proof-of-principle study with heated milk

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation