Skip to main content
Log in

Machine learning to predict retention time of small molecules in nano-HPLC

  • Research Paper
  • Published:
Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Abstract

Retention time is an important parameter for identification in untargeted LC-MS screening. Precise retention time prediction facilitates the annotation process and is well known for proteomics. However, the lack of available experimental information for a long time has limited the prediction accuracy for small molecules. Recently introduced large databases for small-molecule retention times make possible reliable machine learning–based predictions for the whole diversity of compounds. Applying simple projections may expand these predictions on various LC systems and conditions. In our work, we describe a complex approach to predict retention times for nano-HPLC that includes the consequent deployment of binary and regression gradient boosting models trained on the METLIN small-molecule dataset and simple projection of the results with a small number of easily available compounds onto nano-HPLC separations. The proposed model outperforms previous attempts to use machine learning for predictions with a 46-s mean absolute error. The overall performance after transfer to nano-LC conditions is less than 155 s (10.8%) in terms of the median absolute (relative) error. To illustrate the applicability of the described approach, we successfully managed to eliminate averagely 25 to 42% of false-positives with a filter threshold derived from ROC curves. Thus, the proposed approach should be used in addition to other well-established in silico methods and their integration may broaden the range of correctly identified molecules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

Source code of retention time predictors and pre-trained models are available from GitHub https://github.com/osv91/RTpredict.

References

  1. Oberacher H, Arnhard K. Compound identification in forensic toxicological analysis with untargeted LC-MS-based techniques. Bioanalysis. 2015;7(21):2825–40. https://doi.org/10.4155/bio.15.193.

    Article  CAS  PubMed  Google Scholar 

  2. Thevis M, Thomas A, Schanzer W. Current role of LC-MS(/MS) in doping control. Anal Bioanal Chem. 2011;401(2):405–20. https://doi.org/10.1007/s00216-011-4859-9.

    Article  CAS  PubMed  Google Scholar 

  3. Caldwell GW, Leo GC. Can untargeted metabolomics be utilized in drug discovery/development? Curr Top Med Chem. 2017;17(24):2716–39. https://doi.org/10.2174/1568026617666170707130032.

    Article  CAS  PubMed  Google Scholar 

  4. Ismail IT, Showalter MR, Fiehn O. Inborn errors of metabolism in the era of untargeted metabolomics and lipidomics. Metabolites. 2019;9(10). https://doi.org/10.3390/metabo9100242.

  5. Rejczak T, Tuzimski T. Recent trends in sample preparation and liquid chromatography/mass spectrometry for pesticide residue analysis in food and related matrixes. J AOAC Int. 2015;98(5):1143–62. https://doi.org/10.5740/jaoacint.SGE1_Rejczak.

    Article  CAS  PubMed  Google Scholar 

  6. Hernandez F, Sancho JV, Ibanez M, Abad E, Portoles T, Mattioli L. Current use of high-resolution mass spectrometry in the environmental sciences. Anal Bioanal Chem. 2012;403(5):1251–64. https://doi.org/10.1007/s00216-012-5844-7.

    Article  CAS  PubMed  Google Scholar 

  7. Guijas C, Montenegro-Burke JR, Domingo-Almenara X, Palermo A, Warth B, Hermann G, et al. METLIN: a technology platform for identifying knowns and unknowns. Anal Chem. 2018;90(5):3156–64. https://doi.org/10.1021/acs.analchem.7b04424.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Moruz L, Kall L. Peptide retention time prediction. Mass Spectrom Rev. 2017;36(5):615–23. https://doi.org/10.1002/mas.21488.

    Article  CAS  PubMed  Google Scholar 

  9. Vizcaino JA, Csordas A, del-Toro N, Dianes JA, Griss J, Lavidas I, et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 2016;44(D1):D447–D56. https://doi.org/10.1093/nar/gkv1145.

    Article  CAS  PubMed  Google Scholar 

  10. Desiere F, Deutsch EW, King NL, Nesvizhskii AI, Mallick P, Eng J, et al. The PeptideAtlas project. Nucleic Acids Res. 2006;34:D655–D8. https://doi.org/10.1093/nar/gkj040.

    Article  CAS  PubMed  Google Scholar 

  11. Ma CW, Ren Y, Yang JR, Ren Z, Yang HM, Liu SQ. Improved peptide retention time prediction in liquid chromatography through deep learning. Anal Chem. 2018;90(18):10881–8. https://doi.org/10.1021/acs.analchem.8b02386.

    Article  CAS  PubMed  Google Scholar 

  12. Moruz L, Tomazela D, Kall L. Training, selection, and robust calibration of retention time models for targeted proteomics. J Proteome Res. 2010;9(10):5209–16. https://doi.org/10.1021/pr1005058.

    Article  CAS  PubMed  Google Scholar 

  13. Goloborodko AA, Levitsky LI, Ivanov MV, Gorshkov MV. Pyteomics-a Python framework for exploratory data analysis and rapid software prototyping in proteomics. J Am Soc Mass Spectrom. 2013;24(2):301–4. https://doi.org/10.1007/s13361-012-0516-6.

    Article  CAS  PubMed  Google Scholar 

  14. Afkham HM, Qiu XB, The M, Kall L. Uncertainty estimation of predictions of peptides’ chromatographic retention times in shotgun proteomics. Bioinformatics. 2017;33(4):508–13. https://doi.org/10.1093/bioinformatics/btw619.

    Article  CAS  Google Scholar 

  15. Levitsky LI, Klein JA, Ivanov MV, Gorshkov MV. Pyteomics 4.0: five years of development of a python proteomics framework. J Proteome Res. 2019;18(2):709–14. https://doi.org/10.1021/acs.jproteome.8b00717.

    Article  CAS  PubMed  Google Scholar 

  16. Moruz L, Staes A, Foster JM, Hatzou M, Timmerman E, Martens L, et al. Chromatographic retention time prediction for posttranslationally modified peptides. Proteomics. 2012;12(8):1151–9. https://doi.org/10.1002/pmic.201100386.

    Article  CAS  PubMed  Google Scholar 

  17. Aicheler F, Li J, Hoene M, Lehmann R, Xu GW, Kohlbacher O. Retention time prediction improves identification in nontargeted lipidomics approaches. Anal Chem. 2015;87(15):7698–704. https://doi.org/10.1021/acs.analchem.5b01139.

    Article  CAS  PubMed  Google Scholar 

  18. Codesido S, Randazzo GM, Lehmann F, Gonzalez-Ruiz V, Garcia A, Xenarios I, et al. DynaStI: a dynamic retention time database for steroidomics. Metabolites. 2019;9(5). https://doi.org/10.3390/metabo9050085.

  19. Randazzo GM, Tonoli D, Hambye S, Guillarme D, Jeanneret F, Nurisso A, et al. Prediction of retention time in reversed-phase liquid chromatography as a tool for steroid identification. Anal Chim Acta. 2016;916:8–16. https://doi.org/10.1016/j.aca.2016.02.014.

    Article  CAS  PubMed  Google Scholar 

  20. Creek DJ, Jankevics A, Breitling R, Watson DG, Barrett MP, Burgess KEV. Toward global metabolomics analysis with hydrophilic interaction liquid chromatography-mass spectrometry: improved metabolite identification by retention time prediction. Anal Chem. 2011;83(22):8703–10. https://doi.org/10.1021/ac2021823.

    Article  CAS  PubMed  Google Scholar 

  21. Gorynski K, Bojko B, Nowaczyk A, Bucinski A, Pawliszyn J, Kaliszan R. Quantitative structure-retention relationships models for prediction of high performance liquid chromatography retention time of small molecules: endogenous metabolites and banned compounds. Anal Chim Acta. 2013;797:13–9. https://doi.org/10.1016/j.aca.2013.08.025.

    Article  CAS  PubMed  Google Scholar 

  22. Cao MS, Fraser K, Huege J, Featonby T, Rasmussen S, Jones C. Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics. Metabolomics. 2015;11(3):696–706. https://doi.org/10.1007/s11306-014-0727-x.

    Article  CAS  PubMed  Google Scholar 

  23. Samaraweera MA, Hall LM, Hill DW, Grant DF. Evaluation of an artificial neural network retention index model for chemical structure identification in nontargeted metabolomics. Anal Chem. 2018;90(21):12752–60. https://doi.org/10.1021/acs.analchem.8b03118.

    Article  CAS  PubMed  Google Scholar 

  24. Bruderer T, Varesio E, Hopfgartner G. The use of LC predicted retention times to extend metabolites identification with SWATH data acquisition. J Chromatogr B Anal Technol Biomed Life Sci. 2017;1071:3–10. https://doi.org/10.1016/j.jchromb.2017.07.016.

    Article  CAS  Google Scholar 

  25. Falchi F, Bertozzi SM, Ottonello G, Ruda GF, Colombano G, Fiorelli C, et al. Kernel-based, partial least squares quantitative structure-retention relationship model for UPLC retention time prediction: a useful tool for metabolite identification. Anal Chem. 2016;88(19):9510–7. https://doi.org/10.1021/acs.analchem.6b02075.

    Article  CAS  PubMed  Google Scholar 

  26. Liu JJ, Alipuly A, Baczek T, Wong MW, Zuvela P. Quantitative structure-retention relationships with non-linear programming for prediction of chromatographic elution order. Int J Mol Sci. 2019;20(14). https://doi.org/10.3390/ijms20143443.

  27. Aalizadeh R, Nika MC, Thomaidis NS. Development and application of retention time prediction models in the suspect and non-target screening of emerging contaminants. J Hazard Mater. 2019;363:277–85. https://doi.org/10.1016/j.jhazmat.2018.09.047.

    Article  CAS  PubMed  Google Scholar 

  28. Wolfer AM, Lozano S, Umbdenstock T, Croixmarie V, Arrault A, Vayer P. UPLC-MS retention time prediction: a machine learning approach to metabolite identification in untargeted profiling. Metabolomics. 2016;12(1). https://doi.org/10.1007/s11306-015-0888-2.

  29. Broeckling CD, Ganna A, Layer M, Brown K, Sutton B, Ingelsson E, et al. Enabling efficient and confident annotation of LC-MS metabolomics data through MS1 Spectrum and time prediction. Anal Chem. 2016;88(18):9226–34. https://doi.org/10.1021/acs.analchem.6b02479.

    Article  CAS  PubMed  Google Scholar 

  30. Bade R, Bijlsma L, Miller TH, Barron LP, Sancho JV, Hernandez F. Suspect screening of large numbers of emerging contaminants in environmental waters using artificial neural networks for chromatographic retention time prediction and high resolution mass spectrometry data analysis. Sci Total Environ. 2015;538:934–41. https://doi.org/10.1016/j.scitotenv.2015.08.078.

    Article  CAS  PubMed  Google Scholar 

  31. Bijlsma L, Berntssen MHG, Merel S. A refined nontarget workflow for the investigation of metabolites through the prioritization by in silico prediction tools. Anal Chem. 2019;91(9):6321–8. https://doi.org/10.1021/acs.analchem.9b01218.

    Article  CAS  PubMed  Google Scholar 

  32. Stanstrup J, Neumann S, Vrhovsek U. PredRet: prediction of retention time by direct mapping between multiple chromatographic systems. Anal Chem. 2015;87(18):9421–8. https://doi.org/10.1021/acs.analchem.5b02287.

    Article  CAS  PubMed  Google Scholar 

  33. Bach E, Szedmak S, Brouard C, Bocker S, Rousu J. Liquid-chromatography retention order prediction for metabolite identification. Bioinformatics. 2018;34(17):875–83. https://doi.org/10.1093/bioinformatics/bty590.

    Article  CAS  Google Scholar 

  34. Bouwmeester R, Martens L, Degroeve S. Comprehensive and empirical evaluation of machine learning algorithms for small molecule LC retention time prediction. Anal Chem. 2019;91(5):3694–703. https://doi.org/10.1021/acs.analchem.8b05820.

    Article  CAS  PubMed  Google Scholar 

  35. Domingo-Almenara X, Guijas C, Billings E, Montenegro-Burke JR, Uritboonthai W, Aisporna AE, et al. The METLIN small molecule dataset for machine learning-based retention time prediction. Nat Commun. 2019;10. https://doi.org/10.1038/s41467-019-13680-7.

  36. Boswell PG, Schellenberg JR, Carr PW, Cohen JD, Hegeman AD. A study on retention "projection" as a supplementary means for compound identification by liquid chromatography-mass spectrometry capable of predicting retention with different gradients, flow rates, and instruments. J Chromatogr A. 2011;1218(38):6732–41. https://doi.org/10.1016/j.chroma.2011.07.105.

    Article  CAS  PubMed  Google Scholar 

  37. Boswell PG, Schellenberg JR, Carr PW, Cohen JD, Hegeman AD. Easy and accurate high-performance liquid chromatography retention prediction with different gradients, flow rates, and instruments by back-calculation of gradient and flow rate profiles. J Chromatogr A. 2011;1218(38):6742–9. https://doi.org/10.1016/j.chroma.2011.07.070.

    Article  CAS  PubMed  Google Scholar 

  38. Abate-Pella D, Freund DM, Ma Y, Simon-Manso Y, Hollender J, Broeckling CD, et al. Retention projection enables accurate calculation of liquid chromatographic retention times across labs and methods. J Chromatogr A. 2015;1412:43–51. https://doi.org/10.1016/j.chroma.2015.07.108.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Kim S, Chen J, Cheng TJ, Gindulyte A, He J, He SQ, et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 2019;47(D1):D1102–D9. https://doi.org/10.1093/nar/gky1033.

    Article  PubMed  Google Scholar 

  40. Chen TQ, Guestrin C, Assoc Comp M. XGBoost: a scalable tree boosting system. Kdd’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016:785–94. https://doi.org/10.1145/2939672.2939785.

  41. Moriwaki H, Tian YS, Kawashita N, Takagi T. Mordred: a molecular descriptor calculator. J Cheminformatics. 2018;10. https://doi.org/10.1186/s13321-018-0258-y.

  42. Muggeo VMR. Estimating regression models with unknown break-points. Stat Med. 2003;22(19):3055–71. https://doi.org/10.1002/sim.1545.

    Article  PubMed  Google Scholar 

  43. Hodas N, Siegel C, Vishnu A, Goh G. SMILES2vec: an interpretable general-purpose deep neural network for predicting chemical properties. Abstr Pap Am Chem Soc. 2018;256:1.

    Google Scholar 

  44. Chetwynd AJ, David A. A review of nanoscale LC-ESI for metabolomics and its potential to enhance the metabolome coverage. Talanta. 2018;182:380–90. https://doi.org/10.1016/j.talanta.2018.01.084.

    Article  CAS  PubMed  Google Scholar 

  45. Paolo B, Tobias K, Hiroshi T, Dinesh B, Fiehn O. Retip - retention time prediction for metabolomics. https://www.retip.app. Accessed 27 Aug 2020.

  46. Wishart DS, Feunang YD, Marcu A, Guo AC, Liang K, Vazquez-Fresno R, et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 2018;46(D1):D608–D17. https://doi.org/10.1093/nar/gkx1089.

    Article  CAS  PubMed  Google Scholar 

  47. Kostyukevich Y, Zherebker A, Orlov A, Kovaleva O, Burykina T, Isotov B, Nikolaev EN. Hydrogen/deuterium and O/O-exchange mass spectrometry boosting the reliability of compound identification. Analytical chemistry 2020;92(10):6877–85.

  48. Wen YV, Amos RIJ, Talebi M, Szucs R, Dolan JW, Pohl CA, et al. Retention index prediction using quantitative structure-retention relationships for improving structure identification in nontargeted metabolomics. Anal Chem. 2018;90(15):9434–40. https://doi.org/10.1021/acs.analchem.8b02084.

    Article  CAS  PubMed  Google Scholar 

Download references

Funding

The research was supported by the Russian Scientific Foundation grant № 18-79-10127.

Author information

Authors and Affiliations

Authors

Contributions

The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

Corresponding authors

Correspondence to Eugene Nikolaev or Yury Kostyukevich.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(PDF 312 kb).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Osipenko, S., Bashkirova, I., Sosnin, S. et al. Machine learning to predict retention time of small molecules in nano-HPLC. Anal Bioanal Chem 412, 7767–7776 (2020). https://doi.org/10.1007/s00216-020-02905-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00216-020-02905-0

Keywords

Navigation