Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Evaluation guidelines for machine learning tools in the chemical sciences

Abstract

Machine learning (ML) promises to tackle the grand challenges in chemistry and speed up the generation, improvement and/or ordering of research hypotheses. Despite the overarching applicability of ML workflows, one usually finds diverse evaluation study designs. The current heterogeneity in evaluation techniques and metrics leads to difficulty in (or the impossibility of) comparing and assessing the relevance of new algorithms. Ultimately, this may delay the digitalization of chemistry at scale and confuse method developers, experimentalists, reviewers and journal editors. In this Perspective, we critically discuss a set of method development and evaluation guidelines for different types of ML-based publications, emphasizing supervised learning. We provide a diverse collection of examples from various authors and disciplines in chemistry. While taking into account varying accessibility across research groups, our recommendations focus on reporting completeness and standardizing comparisons between tools. We aim to further contribute to improved ML transparency and credibility by suggesting a checklist of retro-/prospective tests and dissecting their importance. We envisage that the wide adoption and continuous update of best practices will encourage an informed use of ML on real-world problems related to the chemical sciences.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Retrospective evaluation in ML.
Fig. 2: Comparison of ML utility relative to competing methods.
Fig. 3: Prospective evaluation of ML models.
Fig. 4: ML for knowledge augmentation.

Similar content being viewed by others

References

  1. Gawehn, E., Hiss, J. A., Brown, J. B. & Schneider, G. Advancing drug discovery via GPU-based deep learning. Expert Opin. Drug Discov. 13, 579–582 (2018).

    Article  PubMed  Google Scholar 

  2. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  3. Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019).

    Google Scholar 

  4. Abadi, M. et al. in Proc. 12th USENIX Conf. Operating Syst. Design Implement. 265–283 (USENIX Association, 2016).

  5. Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).

    Article  CAS  PubMed  Google Scholar 

  7. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).

    Article  CAS  PubMed  Google Scholar 

  8. Myszczynska, M. A. et al. Applications of machine learning to diagnosis and treatment of neurodegenerative diseases. Nat. Rev. Neurol. 16, 440–456 (2020).

    Article  PubMed  Google Scholar 

  9. Richens, J. G., Lee, C. M. & Johri, S. Improving the accuracy of medical diagnosis with causal machine learning. Nat. Commun. 11, 3923 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Yi, P. H., Malone, P., Lin, C. T. & Filice, R. W. Deep learning algorithms for interpretation of upper extremity radiographs: laterality and technologist initial labels as confounding factors. Am. J. Roentgenol. 218, 714–715 (2021).

    Article  Google Scholar 

  11. Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digital Health 1, e271–e297 (2019).

    Article  PubMed  Google Scholar 

  12. Tschandl, P. et al. Human–computer collaboration for skin cancer recognition. Nat. Med. 26, 1229–1234 (2020).

    Article  CAS  PubMed  Google Scholar 

  13. de Almeida, A. F., Moreira, R. & Rodrigues, T. Synthetic organic chemistry driven by artificial intelligence. Nat. Rev. Chem. 3, 589–604 (2019).

    Article  Google Scholar 

  14. Gromski, P. S., Henson, A. B., Granda, J. M. & Cronin, L. How to explore chemical space using algorithms and automation. Nat. Rev. Chem. 3, 119–128 (2019).

    Article  Google Scholar 

  15. Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).

    Article  CAS  PubMed  Google Scholar 

  16. Strieth-Kalthoff, F., Sandfort, F., Segler, M. H. S. & Glorius, F. Machine learning the ropes: principles, applications and directions in synthetic chemistry. Chem. Soc. Rev. 49, 6154–6168 (2020).

    Article  CAS  PubMed  Google Scholar 

  17. Granda, J. M., Donina, L., Dragone, V., Long, D.-L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).

    Article  CAS  PubMed  Google Scholar 

  19. Gómez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).

    Article  PubMed  Google Scholar 

  20. Shamay, Y. et al. Quantitative self-assembly prediction yields targeted nanomedicines. Nat. Mater. 17, 361–368 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Reker, D., Hoyt, E. A., Bernardes, G. J. L. & Rodrigues, T. Adaptive optimization of chemical reactions with minimal experimental information. Cell Rep. Phys. Sci. 1, 100247 (2020).

    Article  Google Scholar 

  22. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    Article  CAS  PubMed  Google Scholar 

  23. Schreck, J. S., Coley, C. W. & Bishop, K. J. M. Learning retrosynthetic planning through simulated experience. ACS Cent. Sci. 5, 970–981 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Tu, K. H. et al. Machine learning predictions of block copolymer self-assembly. Adv. Mater. 32, 2005713 (2020).

    Article  Google Scholar 

  26. Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171–180 (2020).

    Article  Google Scholar 

  27. Yao, Z. et al. Inverse design of nanoporous crystalline reticular materials with deep generative models. Nat. Mach. Intell. 3, 76–86 (2021).

    Article  Google Scholar 

  28. Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).

    Article  CAS  PubMed  Google Scholar 

  29. Gao, T. & Lu, W. Machine learning toward advanced energy storage devices and systems. iScience 24, 101936 (2021).

    Article  CAS  PubMed  Google Scholar 

  30. Severson, K. A. et al. Data-driven prediction of battery cycle life before capacity degradation. Nat. Energy 4, 383–391 (2019).

    Article  Google Scholar 

  31. Rodrigues, T. et al. Machine intelligence decrypts β-lapachone as an allosteric 5-lipoxygenase inhibitor. Chem. Sci. 9, 6899–6903 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Conde, J. et al. Allosteric antagonist modulation of TRPV2 by piperlongumine impairs glioblastoma progression. ACS Cent. Sci. 7, 868–881 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).

    Article  CAS  PubMed  Google Scholar 

  34. Wang, T. et al. Improved fragment sampling for ab initio protein structure prediction using deep neural networks. Nat. Mach. Intell. 1, 347–355 (2019).

    Article  Google Scholar 

  35. Tian, Y. et al. Determining multi-component phase diagrams with desired characteristics using active learning. Adv. Sci. 8, 2003165 (2020).

    Article  Google Scholar 

  36. Reker, D., Bernardes, G. J. L. & Rodrigues, T. Computational advances in combating colloidal aggregation in drug discovery. Nat. Chem. 11, 402–418 (2019).

    Article  CAS  PubMed  Google Scholar 

  37. Reker, D. et al. Computationally guided high-throughput design of self-assembling drug nanoparticles. Nat. Nanotech. 16, 725–733 (2021).

    Article  CAS  Google Scholar 

  38. Timmreck, R. et al. Characterization of tandem organic solar cells. Nat. Photon. 9, 478–479 (2015).

    Article  CAS  Google Scholar 

  39. Jones, D. T. Setting the standards for machine learning in biology. Nat. Rev. Mol. Cell Biol. 20, 659–660 (2019).

    Article  CAS  PubMed  Google Scholar 

  40. Walsh, I. et al. DOME: recommendations for supervised machine learning validation in biology. Nat. Mater. 18, 1122–1127 (2021).

    CAS  Google Scholar 

  41. Horstmeyer, R., Heintzmann, R., Popescu, G., Waller, L. & Yang, C. Standardizing the resolution claims for coherent microscopy. Nat. Photon. 10, 68–71 (2016).

    Article  CAS  Google Scholar 

  42. Faria, M. et al. Minimum information reporting in bio–nano experimental literature. Nat. Nanotech. 13, 777–785 (2018).

    Article  CAS  Google Scholar 

  43. Miernicki, M., Hofmann, T., Eisenberger, I., Kammer, F. V. D. & Praetorius, A. Legal and practical challenges in classifying nanomaterials according to regulatory definitions. Nat. Nanotech. 14, 208–216 (2019).

    Article  CAS  Google Scholar 

  44. Aldrich, C. et al. The ecstasy and agony of assay interference compounds. ACS Cent. Sci. 3, 143–147 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Jain, A. N. & Nicholls, A. Recommendations for evaluation of computational methods. J. Computer Aided Mol. Des. 22, 133–139 (2008).

    Article  CAS  Google Scholar 

  46. Artrith, N. et al. Best practices in machine learning for chemistry. Nat. Chem. 13, 505–508 (2021).

    Article  CAS  PubMed  Google Scholar 

  47. Alves, V. M. et al. SCAM detective: accurate predictor of small, colloidally aggregating molecules. J. Chem. Inf. Model. 60, 4056–4063 (2020).

    Article  CAS  PubMed  Google Scholar 

  48. Lee, K. et al. Combating small-molecule aggregation with machine learning. Cell Rep. Phys. Sci. 2, 100573 (2021).

    Article  Google Scholar 

  49. Bender, A. & Cortés-Ciriano, I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet. Drug Discov. Today 26, 511–524 (2021).

    Article  CAS  PubMed  Google Scholar 

  50. Bender, A. & Cortes-Ciriano, I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: A discussion of chemical and biological data. Drug Discov. Today 26, 1040–1052 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Brown, S. P., Muchmore, S. W. & Hajduk, P. J. Healthy skepticism: assessing realistic model performance. Drug Discov. Today 14, 420–427 (2009).

    Article  PubMed  Google Scholar 

  52. Robinson, M. C., Glen, R. C. & Lee, A. A. Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction. J. Computer Aided Mol. Des. 34, 717–730 (2020).

    Article  CAS  Google Scholar 

  53. Cichońska, A. et al. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat. Commun. 12, 3307 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Brown, N., Fiscato, M., Segler, M. H. S. & Vaucher, A. C. GuacaMol: benchmarking models for de novo molecular design. J. Chem. Inf. Model. 59, 1096–1108 (2019).

    Article  CAS  PubMed  Google Scholar 

  55. Raji, I. D., Bender, E. M., Paullada, A., Denton, E. & Hanna, A. AI and the everything in the whole wide world benchmark. Preprint at arXiv https://arxiv.org/abs/2111.15366 (2021).

  56. Renz, P., Rompaey, D. V., Wegner, J. K., Hochreiter, S. & Klambauer, G. On failure modes in molecule generation and optimization. Drug Discov. Today Technol. 32–33, 55–63 (2019).

    Article  PubMed  Google Scholar 

  57. Chen, L. et al. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS ONE 14, e0220113 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem. Inf. Model. 58, 916–932 (2018).

    Article  CAS  PubMed  Google Scholar 

  59. Sieg, J., Flachsenberg, F. & Rarey, M. In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. J. Chem. Inf. Model. 59, 947–961 (2019).

    Article  CAS  PubMed  Google Scholar 

  60. Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    Article  CAS  PubMed  Google Scholar 

  61. Stanley, M. et al. in 35th Conf. Neural Inform. Process. Syst. Datasets Benchmarks Track (NeurIPS, 2021).

  62. Thakkar, A., Kogej, T., Reymond, J.-L., Engkvist, O. & Bjerrum, E. J. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chem. Sci. 11, 154–168 (2020).

    Article  CAS  PubMed  Google Scholar 

  63. Chen, G. et al. Alchemy: a quantum chemistry dataset for benchmarking AI models. Preprint at arXiv https://arxiv.org/abs/1906.09427 (2019).

  64. Rodrigues, T. The good, the bad, and the ugly in chemical and biological data for machine learning. Drug Discov. Today Technol. 32–33, 3–8 (2019).

    Article  PubMed  Google Scholar 

  65. Heil, B. J. et al. Reproducibility standards for machine learning in the life sciences. Nat. Mater. 18, 1132–1135 (2021).

    CAS  Google Scholar 

  66. McCloskey, K. et al. Machine learning on DNA-encoded libraries: a new paradigm for hit finding. J. Med. Chem. 63, 8857–8866 (2020).

    Article  CAS  PubMed  Google Scholar 

  67. Giblin, K. A., Hughes, S. J., Boyd, H., Hansson, P. & Bender, A. Prospectively validated proteochemometric models for the prediction of small-molecule binding to bromodomain proteins. J. Chem. Inf. Model. 58, 1870–1888 (2018).

    Article  CAS  PubMed  Google Scholar 

  68. Mathai, N., Chen, Y. & Kirchmair, J. Validation strategies for target prediction methods. Brief. Bioinform. 21, 791–802 (2020).

    Article  CAS  PubMed  Google Scholar 

  69. Mitchell, J. B. O. Machine learning methods in chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 468–481 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Vishwakarma, G., Sonpal, A. & Hachmann, J. Metrics for benchmarking and uncertainty quantification: quality, applicability, and a path to best practices for machine learning in chemistry. Preprint at arXiv https://arxiv.org/abs/2010.00110 (2020).

  71. Rosario, Z. D., Rupp, M., Kim, Y., Antono, E. & Ling, J. Assessing the frontier: active learning, model accuracy, and multi-objective candidate discovery and optimization. J. Chem. Phys. 153, 024112 (2020).

    Article  PubMed  Google Scholar 

  72. Schwaller, P., Gaudin, T., Lányi, D., Bekas, C. & Laino, T. “Found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Yu, T. & Zhu, H. Hyper-parameter optimization: a review of algorithms and applications. Preprint at arXiv https://arxiv.org/abs/2003.05689 (2020).

  74. Schwaller, P., Vaucher, A. C., Laino, T. & Reymond, J.-L. Prediction of chemical reaction yields using deep learning. Mach. Learn. Sci. Technol. 2, 0115016 (2021).

    Article  Google Scholar 

  75. Sandfort, F., Strieth-Kalthoff, F., Kühnemund, M., Beecks, C. & Glorius, F. A structure-based platform for predicting chemical reactivity. Chem 6, 1379–1390 (2020).

    Article  CAS  Google Scholar 

  76. Scikit-learn Developers. Cross-validation: evaluating estimator performance. Scikit https://scikit-learn.org/stable/modules/cross_validation.html (2021).

  77. Sheridan, R. P. Time-split cross-validation as a method for estimating the goodness of prospective prediction. J. Chem. Inf. Model. 53, 783–790 (2013).

    Article  CAS  PubMed  Google Scholar 

  78. Pesciullesi, G., Schwaller, P., Laino, T. & Reymond, J.-L. Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates. Nat. Commun. 11, 4874 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Ho, S. Y., Phua, K., Wong, L. & Goh, W. W. B. Extensions of the external validation for checking learned model interpretability and generalizability. Patterns 1, 100129 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Alexander, D. L. J., Tropsha, A. & Winkler, D. A. Beware of R2: simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J. Chem. Inf. Model. 55, 1316–1322 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Golbraikh, A. & Tropsha, A. Beware of q2! J. Mol. Graph. Model. 20, 269–276 (2002).

    Article  CAS  PubMed  Google Scholar 

  82. Consonni, V., Davide, B. & Todeschini, R. Comments on the definition of the Q2 parameter for QSAR validation. J. Chem. Inf. Model. 49, 1669–1678 (2009).

    Article  CAS  PubMed  Google Scholar 

  83. Derumigny, A. & Fermanian, J.-D. A classification point-of-view about conditional Kendall’s tau. Preprint at arXiv https://arxiv.org/abs/1806.09048 (2018).

  84. Raeder, T., Forman, G. & Chawla, N. V. in Data Mining: Foundations and Intelligent Paradigms (eds Holmes, D. E. & Jain, L. C.) 315–331 (Springer, 2012).

  85. Brown, J. B. Classifiers and their metrics quantified. Mol. Inf. 37, 1700127 (2018).

    Article  Google Scholar 

  86. Beker, W., Wołos, A., Szymkuć, S. & Grzybowski, B. A. Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks. Nat. Mach. Intell. 2, 457–465 (2020).

    Article  Google Scholar 

  87. Perryman, A. L., Inoyama, D., Patel, J. S., Ekins, S. & Freundlich, J. S. Pruned machine learning models to predict aqueous solubility. ACS Omega 5, 16562–16567 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 3, 144–152 (2021).

    Article  Google Scholar 

  89. Schwaller, P. et al. Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chem. Sci. 11, 3316–3325 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Mo, Y. et al. Evaluating and clustering retrosynthesis pathways with learned strategy. Chem. Sci. 12, 1469–1478 (2021).

    Article  CAS  Google Scholar 

  91. Talebian, S. et al. Facts and figures on materials science and nanotechnology progress and investment. ACS Nano 15, 15940–15952 (2021).

    Article  CAS  PubMed  Google Scholar 

  92. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminf. 9, 48 (2017).

    Article  Google Scholar 

  93. Blaschke, T., Engkvist, O., Bajorath, J. & Chen, H. Memory-assisted reinforcement learning for diverse molecular de novo design. J. Cheminf. 12, 68 (2020).

    Article  CAS  Google Scholar 

  94. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Haghighi, S., Jasemi, M., Hessabi, S. & Zolanvari, A. PyCM: multiclass confusion matrix library in Python. J. Open Source Softw. 3, 729 (2018).

    Article  Google Scholar 

  96. Beker, W., Gajewska, E. P., Badowski, T. & Grzybowski, B. A. Prediction of major regio-, site-, and diastereoisomers in Diels–Alder reactions by using machine-learning: the importance of physically meaningful descriptors. Angew. Chem. Int. Ed. 58, 4515–4519 (2019).

    Article  CAS  Google Scholar 

  97. Has¨e, F., Roch, Lc. M., Kreisbeck, C. & Aspuru-Guzik, A. Phoenics: a Bayesian optimizer for chemistry. ACS Cent. Sci. 4, 1134–1145 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  98. Nielsen, M. K., Ahneman, D. T., Riera, O. & Doyle, A. G. Deoxyfluorination with sulfonyl fluorides: navigating reaction space with machine learning. J. Am. Chem. Soc. 140, 5004–5008 (2018).

    Article  CAS  PubMed  Google Scholar 

  99. MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. Sci. Adv. 6, eaaz8867 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Walters, W. P. & Murcko, M. Assessing the impact of generative AI on medicinal chemistry. Nat. Biotechnol. 38, 143–145 (2020).

    Article  CAS  PubMed  Google Scholar 

  101. Aickin, M. & Gensler, H. Adjusting for multiple testing when reporting research results: the Bonferroni vs Holm methods. Am. J. Public Health 86, 726–728 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Chuang, K. V. & Keiser, M. J. Adversarial controls for scientific machine learning. ACS Chem. Biol. 13, 2819–2831 (2018).

    Article  CAS  PubMed  Google Scholar 

  103. Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    Article  CAS  PubMed  Google Scholar 

  104. Chuang, K. V. & Keiser, M. J. Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”. Science 362, eaat8603 (2018).

    Article  PubMed  Google Scholar 

  105. Schwaller, P. et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Maragakis, P., Nisonoff, H., Cole, B. & Shaw, D. E. A deep-learning view of chemical space designed to facilitate drug discovery. J. Chem. Inf. Model. 60, 4487–4496 (2020).

    Article  CAS  PubMed  Google Scholar 

  107. Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Reid, J. P., Proctor, R. S. J., Sigman, M. S. & Phipps, R. J. Predictive multivariate linear regression analysis guides successful catalytic enantioselective Minisci reactions of diazines. J. Am. Chem. Soc. 141, 19178–19185 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Brix, K. V., DeForest, D. K., Tear, L., Grose, M. & Adam, W. J. Use of multiple linear regression models for setting water quality criteria for copper: a complementary approach to the biotic ligand model. Environ. Sci. Technol. 51, 5182–5192 (2017).

    Article  CAS  PubMed  Google Scholar 

  110. Toste, F. D., Sigman, M. S. & Miller, S. J. Pursuit of noncovalent interactions for strategic site-selective catalysis. Acc. Chem. Res. 50, 609–615 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Reid, J. P. & Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Zahrt, A. F., Athavale, S. V. & Denmark, S. E. Quantitative structure–selectivity relationships in enantioselective catalysis: past, present, and future. Chem. Rev. 120, 1620–1689 (2020).

    Article  CAS  PubMed  Google Scholar 

  113. Rodrigues, T. Deriving intuition in catalyst design with machine learning. Chem 8, 15–17 (2022).

    Article  CAS  Google Scholar 

  114. Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Dai, H., Li, C., Coley, C. W., Dai, B. & Song, L. Retrosynthesis prediction with conditional graph logic network. Preprint at arXiv https://arxiv.org/abs/2001.01408 (2020).

  116. Vaucher, A. C. et al. Inferring experimental procedures from text-based representations of chemical reactions. Nat. Commun. 12, 2573 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Gillet, V. J., Willett, P. & Bradshaw, J. Identification of biological activity profiles using substructural analysis and genetic algorithms. J. Chem. Inf. Comput. Sci. 38, 165–179 (1998).

    Article  CAS  PubMed  Google Scholar 

  118. Edgar, S. J., Holliday, J. D. & Willett, P. Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. J. Mol. Graph. Model. 18, 343–357 (2000).

    Article  CAS  PubMed  Google Scholar 

  119. Schneider, G. & Böhm, H.-J. Virtual screening and fast automated docking methods. Drug Discov. Today 7, 64–70 (2002).

    Article  CAS  PubMed  Google Scholar 

  120. Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity. ACS Cent. Sci. 3, 1237–1245 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Capecchi, A., Probst, D. & Reymond, J.-L. One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome. J. Cheminf. 12, 43 (2020).

    Article  CAS  Google Scholar 

  122. Rodrigues, T., Almeida, B. P. D., Barbosa-Morais, N. L. & Bernardes, G. J. L. Dissecting celastrol with machine learning to unveil dark pharmacology. Chem. Commun. 55, 6369–6372 (2019).

    Article  CAS  Google Scholar 

  123. Rodrigues, T. et al. De novo fragment design for drug discovery and chemical biology. Angew. Chem. Int. Ed. 54, 15079–15083 (2015).

    Article  CAS  Google Scholar 

  124. Häse, F., Roch, L. M., Friederich, P. & Aspuru-Guzik, A. Designing and understanding light-harvesting devices with machine learning. Nat. Commun. 11, 4587 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  125. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).

    Article  CAS  PubMed  Google Scholar 

  126. Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. & Merk, D. Beam search for automated design and scoring of novel ROR ligands with machine intelligence. Angew. Chem. Int. Ed. 60, 19477–19482 (2021).

    Article  CAS  Google Scholar 

  127. Kearnes, S. Pursuing a prospective perspective. Trends Chem. 3, 77–79 (2021).

    Article  Google Scholar 

  128. Deringer, V. L. et al. Origins of structural and electronic transitions in disordered silicon. Nature 589, 59–64 (2021).

    Article  CAS  PubMed  Google Scholar 

  129. Porwol, L. et al. An autonomous chemical robot discovers the rules of inorganic coordination chemistry without prior knowledge. Angew. Chem. Int. Ed. 59, 11256–11261 (2020).

    Article  CAS  Google Scholar 

  130. Kurczab, R., Smusz, S. & Bojarski, A. J. The influence of negative training set size on machine learning-based virtual screening. J. Cheminf. 6, 32 (2014).

    Article  Google Scholar 

  131. Lewis, R. A., Ertl, P., Schneider, N. & Stiefl, N. Reducing the concepts of data science and machine learning to tools for the bench chemist. Chimia 73, 1001–1005 (2019).

    Article  CAS  PubMed  Google Scholar 

  132. Reutlinger, M., Rodrigues, T., Schneider, P. & Schneider, G. Multi-objective molecular de novo design by adaptive fragment prioritization. Angew. Chem. Int. Ed. 53, 4244–4248 (2014).

    Article  CAS  Google Scholar 

  133. Anders, C. J., Montavon, G., Samek, W. & Müller, K.-R. in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (eds Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R.) 297–309 (Springer, 2019).

  134. Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).

    Article  Google Scholar 

  135. Sheridan, R. P. Interpretation of QSAR models by coloring atoms according to changes in predicted activity: how robust is it? J. Chem. Inf. Model. 59, 1324–1337 (2019).

    Article  CAS  PubMed  Google Scholar 

  136. Matveieva, M. & Polishchuk, P. Benchmarks for interpretation of QSAR models. J. Cheminf. 13, 41 (2021).

    Article  CAS  Google Scholar 

  137. Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  138. Ribeiro, M. T., Singh, S. & Guestrin, C. “Why should I trust you?”: explaining the predictions of any classifier. Preprint at arXiv https://arxiv.org/abs/1602.04938 (2016).

  139. Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Zhong, M. et al. Accelerated discovery of CO2 electrocatalysts using active machine learning. Nature 581, 178–184 (2020).

    Article  CAS  PubMed  Google Scholar 

  141. Riniker, S. & Landrum, G. A. Similarity maps — a visualization strategy for molecular fingerprints and machine-learning methods. J. Cheminf. 5, 43 (2013).

    Article  CAS  Google Scholar 

  142. Friederich, P., Krenn, M., Tamblyn, I. & Aspuru-Guzik, A. Scientific intuition inspired by machine learning generated hypotheses. Mach. Learn. Sci. Technol. 2, 025027 (2021).

    Article  Google Scholar 

  143. Webel, H. E. et al. Revealing cytotoxic substructures in molecules using deep learning. J. Computer Aided Mol. Des. 34, 731–746 (2020).

    Article  CAS  Google Scholar 

  144. Singh, S. et al. A unified machine-learning protocol for asymmetric catalysis as a proof of concept demonstration using asymmetric hydrogenation. Proc. Natl Acad. Sci. USA 117, 1339–1345 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. Coley, C. W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).

    Article  CAS  PubMed  Google Scholar 

  146. Reker, D. & Schneider, G. Active-learning strategies in computer-assisted drug discovery. Drug Discov. Today 20, 458–465 (2015).

    Article  PubMed  Google Scholar 

  147. Reutlinger, M. et al. Chemically Advanced Template Search (CATS) for scaffold-hopping and prospective target prediction for ‘orphan’ molecules. Mol. Inf. 32, 133–138 (2013).

    Article  CAS  Google Scholar 

  148. Reker, D., Schneider, P. & Schneider, G. Multi-objective active machine learning rapidly improves structure–activity models and reveals new protein–protein interaction inhibitors. Chem. Sci. 7, 3919–3927 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Schwaller, P., Hoover, B., Reymond, J.-L., Strobelt, H. & Laino, T. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions. Sci. Adv. 7, eabe4166 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  150. Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).

    Article  CAS  PubMed  Google Scholar 

  151. Gromski, P. S., Granda, J. M. & Cronin, L. Universal chemical synthesis and discovery with ‘The Chemputer’. Trends Chem. 2, 4–12 (2020).

    Article  CAS  Google Scholar 

  152. Turing, A. M. Computing machinery and intelligence. Mind 56, 433–560 (1950).

    Article  Google Scholar 

  153. Mikulak-Klucznik, B. et al. Computational planning of the synthesis of complex natural products. Nature 588, 83–88 (2020).

    Article  CAS  PubMed  Google Scholar 

  154. Duros, V. et al. Human versus robots in the discovery and crystallization of gigantic polyoxometalates. Angew. Chem. Int. Ed. 56, 10815–10820 (2017).

    Article  CAS  Google Scholar 

  155. Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).

    Article  CAS  Google Scholar 

  156. Shields, B. J. et al. Bayesian reaction optimization as a tool for chemical synthesis. Nature 590, 89–96 (2021).

    Article  CAS  PubMed  Google Scholar 

  157. Polykovskiy, D. et al. Molecular Sets (MOSES): a benchmarking platform for molecular generation models. Front. Pharmacol. 11, 1931 (2020).

    Article  Google Scholar 

  158. Arús-Pous, J. et al. Exploring the GDB-13 chemical space using deep generative models. J. Cheminf. 11, 20 (2019).

    Article  Google Scholar 

  159. Mysinger, M. M., Carchia, M., Irwin, J. J. & Shoichet, B. K. Directory of Useful Decoys, Enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 55, 6582–6594 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. Lowe, D. M. Extraction of Chemical Structures and Reactions from the Literature. Thesis, Univ. Cambridge (2012).

  161. Axelrod, S. & Gómez-Bombarelli, R. GEOM: energy-annotated molecular conformations for property prediction and molecular generation. Preprint at arXiv https://arxiv.org/abs/2006.05531 (2020).

  162. Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).

    Article  CAS  PubMed  Google Scholar 

  163. García-Ortegón, M. et al. DOCKSTRING: easy molecular docking yields better benchmarks for ligand design. Preprint at arXiv https://arxiv.org/abs/2110.15486 (2021).

  164. Sun, J. et al. ExCAPE-DB: an integrated large scale dataset facilitating big data analysis in chemogenomics. J. Cheminf. 9, 17 (2017).

    Article  Google Scholar 

  165. Segler, M. H. S. & Waller, P. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J. 23, 5966–5971 (2017).

    Article  CAS  PubMed  Google Scholar 

  166. McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at arXiv https://arxiv.org/abs/1802.03426 (2020).

Download references

Acknowledgements

T.R. acknowledges FCT Portugal for funding (CEECIND/00684/2018). T.R. thanks colleagues for discussions on the topic presented here over the years. D. Reker is acknowledged for providing access to original data discussed in the manuscript. T.R., O.E. and A.B. acknowledge that not all suggested evaluation studies might simultaneously be found in their own original research manuscripts. We thank M. Thomas and M. Garcia-Ortegon for help with Table 1.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the discussion and writing of the manuscript.

Corresponding author

Correspondence to Tiago Rodrigues.

Ethics declarations

Competing interests

T.R. is a co-founder and shareholder of TargTex and has acted as consultant to the pharmaceutical industry. A.B. is a co-founder and shareholder of Healx, PharmEnable and Terra Lumina and acts as a consultant to various pharmaceutical companies.

Peer review

Peer review information

Nature Reviews Chemistry thanks F. Grisoni and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

DOCKSTRING: https://github.com/dockstring/dockstring

DUD-E: http://dude.docking.org/

ExCAPE: https://solr.ideaconsult.net/search/excape/

FS-Mol: https://github.com/microsoft/FS-Mol

GDB-13: https://gdb.unibe.ch/downloads/

GEOM: https://github.com/learningmatter-mit/geom

GuacaMol: https://github.com/BenevolentAI/guacamol

Kaggle competitions: http://www.kaggle.com/

MoleculeNet: https://moleculenet.org/

MOSES: https://github.com/molecularsets/moses

PDBbind: http://www.pdbbind.org.cn/

RXNMapper: http://rxnmapper.ai/

SAMPL blind challenges: http://www.samplchallenges.org/

USPTO: https://figshare.com/articles/dataset/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bender, A., Schneider, N., Segler, M. et al. Evaluation guidelines for machine learning tools in the chemical sciences. Nat Rev Chem 6, 428–442 (2022). https://doi.org/10.1038/s41570-022-00391-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41570-022-00391-9

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics