Organic reactivity from mechanism to machine learning

Jorner, Kjell; Tomberg, Anna; Bauer, Christoph; Sköld, Christian; Norrby, Per-Ola

doi:10.1038/s41570-021-00260-x

Review Article
Published: 16 March 2021

Organic reactivity from mechanism to machine learning

Nature Reviews Chemistry volume 5, pages 240–255 (2021)Cite this article

8594 Accesses
90 Citations
36 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 22 March 2021

This article has been updated

Abstract

As more data are introduced in the building of models of chemical reactivity, the mechanistic component can be reduced until ‘big data’ applications are reached. These methods no longer depend on underlying mechanistic hypotheses, potentially learning them implicitly through extensive data training. Reactivity models often focus on reaction barriers, but can also be trained to directly predict lab-relevant properties, such as yields or conditions. Calculations with a quantum-mechanical component are still preferred for quantitative predictions of reactivity. Although big data applications tend to be more qualitative, they have the advantage to be broadly applied to different kinds of reactions. There is a continuum of methods in between these extremes, such as methods that use quantum-derived data or descriptors in machine learning models. Here, we present an overview of the recent machine learning applications in the field of chemical reactivity from a mechanistic perspective. Starting with a summary of how reactivity questions are addressed by quantum-mechanical methods, we discuss methods that augment or replace quantum-based modelling with faster alternatives relying on machine learning.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Approaches for modelling chemical reactivity.**

**Fig. 2: An example reaction profile of a simple E2 elimination.**

**Fig. 3: Molecular mechanics methods for generating transition states.**

**Fig. 4: Reactivity predictions from quantum mechanics data.**

**Fig. 5: Experiment versus prediction using a descriptor-based model.**

**Fig. 6: Different types of reaction fingerprints.**

Machine learning in chemical reaction space

Article Open access 30 October 2020

Organic reaction mechanism classification using machine learning

Article 25 January 2023

Chemical reaction networks and opportunities for machine learning

Article 16 January 2023

Change history

22 March 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41570-021-00272-7

References

Engkvist, O. et al. Computational prediction of chemical reactions: current status and outlook. Drug Discov. Today 23, 1203–1218 (2018).
Article CAS PubMed Google Scholar
de Almeida, A. F., Moreira, R. & Rodrigues, T. Synthetic organic chemistry driven by artificial intelligence. Nat. Rev. Chem. 3, 589–604 (2019).
Article Google Scholar
Struble, T. J. et al. Current and future roles of artificial intelligence in medicinal chemistry synthesis. J. Med. Chem. 63, 8667–8682 (2020).
Article CAS PubMed PubMed Central Google Scholar
Coley, C. W., Eyke, N. S. & Jensen, K. F. Autonomous discovery in the chemical sciences part II: outlook. Angew. Chem. Int. Ed. 59, 23414–23436 (2020).
Article CAS Google Scholar
Zahrt, A. F., Athavale, S. V. & Denmark, S. E. Quantitative structure–selectivity relationships in enantioselective catalysis: past, present, and future. Chem. Rev. 120, 1620–1689 (2020).
Article CAS PubMed Google Scholar
Reid, J. P. & Sigman, M. S. Comparing quantitative prediction methods for the discovery of small-molecule chiral catalysts. Nat. Rev. Chem. 2, 290–305 (2018).
Article CAS Google Scholar
Cramer, C. J. Essentials of Computational Chemistry: Theories and Models 2nd edn (Wiley, 2004).
Maskill, H. The Physical Basis of Organic Chemistry (Oxford Univ. Press, 1985).
Eyring, H. The activated complex in chemical reactions. J. Chem. Phys. 3, 107–115 (1935).
Article CAS Google Scholar
Clot, E. & Norrby, P.-O. in Innovative Catalysis in Organic Synthesis: Oxidation, Hydrogenation, and C-X Bond Forming Reactions (ed. Andersson, P. G.) (Wiley, 2012).
Kozuch, S. & Shaik, S. How to conceptualize catalytic cycles? The energetic span model. Acc. Chem. Res. 44, 101–110 (2011).
Article CAS PubMed Google Scholar
Plata, R. E. & Singleton, D. A. A case study of the mechanism of alcohol-mediated Morita Baylis–Hillman reactions. The importance of experimental observations. J. Am. Chem. Soc. 137, 3811–3826 (2015).
Article CAS PubMed PubMed Central Google Scholar
Jorner, K., Brinck, T., Norrby, P.-O. & Buttar, D. Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies. Chem. Sci. 12, 1163–1175 (2021).
Article CAS Google Scholar
Maeda, S. & Ohno, K. Global mapping of equilibrium and transition structures on potential energy surfaces by the scaled hypersphere search method: applications to ab initio surfaces of formaldehyde and propyne molecules. J. Phys. Chem. A 109, 5742–5753 (2005).
Article CAS PubMed Google Scholar
Nett, A. J., Zhao, W., Zimmerman, P. M. & Montgomery, J. Highly active nickel catalysts for C–H functionalization identified through analysis of off-cycle intermediates. J. Am. Chem. Soc. 137, 7636–7639 (2015).
Article CAS PubMed Google Scholar
Hansen, E., Rosales, A. R., Tutkowski, B., Norrby, P.-O. & Wiest, O. Prediction of stereochemistry using Q2MM. Acc. Chem. Res. 49, 996–1005 (2016).
Article CAS PubMed PubMed Central Google Scholar
Houk, K. N. & Liu, F. Holy grails for computational organic chemistry and biochemistry. Acc. Chem. Res. 50, 539–543 (2017).
Article CAS PubMed Google Scholar
Guan, Y., Ingman, V. M., Rooks, B. J. & Wheeler, S. E. AARON: an automated reaction optimizer for new catalysts. J. Chem. Theory Comput. 14, 5249–5261 (2018).
Article CAS PubMed Google Scholar
Maeda, S., Ohno, K. & Morokuma, K. Systematic exploration of the mechanism of chemical reactions: the global reaction route mapping (GRRM) strategy using the ADDF and AFIR methods. Phys. Chem. Chem Phys 15, 3683–3701 (2013).
Article CAS PubMed Google Scholar
Bannwarth, C. et al. Extended tight-binding quantum chemistry methods. Wiley Interdiscip. Rev. Comput. Mol. Sci. 11, e1493 (2020).
Google Scholar
Grimme, S. et al. Fully automated quantum-chemistry-based computation of spin–spin-coupled nuclear magnetic resonance spectra. Angew. Chem. Int. Ed. 56, 14763–14769 (2017).
Article CAS Google Scholar
Koerstz, M., Christensen, A. S., Mikkelsen, K. V., Nielsen, M. B. & Jensen, J. H. High throughput virtual screening of 230 billion molecular solar heat battery candidates. PeerJ Phys. Chem. 3, e16 (2021).
Article Google Scholar
Kromann, J. C., Jensen, J. H., Kruszyk, M., Jessing, M. & Jørgensen, M. Fast and accurate prediction of the regioselectivity of electrophilic aromatic substitution reactions. Chem. Sci. 9, 660–665 (2018).
Article CAS PubMed Google Scholar
Hwang, M. J., Stockfisch, T. P. & Hagler, A. T. Derivation of class II force fields. 2. Derivation and characterization of a class II force field, CFF93, for the alkyl functional group and alkane molecules. J. Am. Chem. Soc. 116, 2515–2525 (1994).
Article CAS Google Scholar
Senftle, T. P. et al. The ReaxFF reactive force-field: development, applications and future directions. NPJ Comput. Mater. 2, 15011 (2016).
Article CAS Google Scholar
Jensen, F. Introduction to Computational Chemistry 3rd edn (Wiley, 2017).
Jensen, F. Locating minima on seams of intersecting potential energy surfaces. An application to transition structure modeling. J. Am. Chem. Soc. 114, 1596–1603 (1992).
Article CAS Google Scholar
Eksterowicz, J. E. & Houk, K. N. Transition-state modeling with empirical force fields. Chem. Rev. 93, 2439–2461 (1993).
Article CAS Google Scholar
Åqvist, J. & Warshel, A. Simulation of enzyme reactions using valence bond force fields and other hybrid quantum/classical approaches. Chem. Rev. 93, 2523–2544 (1993).
Article Google Scholar
Hartke, B. & Grimme, S. Reactive force fields made simple. Phys. Chem. Chem. Phys. 17, 16715–16718 (2015).
Article CAS PubMed Google Scholar
Weill, N., Corbeil, C. R., De Schutter, J. W. & Moitessier, N. Toward a computational tool predicting the stereochemical outcome of asymmetric reactions: development of the molecular mechanics-based program ACE and application to asymmetric epoxidation reactions. J. Comput. Chem. 32, 2878–2889 (2011).
Article CAS PubMed Google Scholar
Sherrod, M. J. & Menger, F. M. “Transition-state modeling” does not always model transition states. J. Am. Chem. Soc. 111, 2611–2613 (1989).
Article CAS Google Scholar
Rosales, A. R. et al. Rapid virtual screening of enantioselective catalysts using CatVS. Nat. Catal. 2, 41–45 (2019).
Article CAS Google Scholar
Rosales, A. R. et al. Transition state force field for the asymmetric redox-relay Heck reaction. J. Am. Chem. Soc. 142, 9700–9707 (2020).
CAS PubMed PubMed Central Google Scholar
Rosales, A. R. et al. Application of Q2MM to predictions in stereoselective synthesis. Chem. Commun. 54, 8294–8311 (2018).
Article CAS Google Scholar
Burai Patrascu, M. et al. From desktop to benchtop with automated computational workflows for computer-aided design in asymmetric catalysis. Nat. Catal. 3, 574–584 (2020).
Article CAS Google Scholar
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).
Article CAS PubMed PubMed Central Google Scholar
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules. Sci. Data 4, 170193 (2017).
Article CAS PubMed PubMed Central Google Scholar
Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).
Article PubMed Google Scholar
Kang, P.-L., Shang, C. & Liu, Z.-P. Glucose to 5-hydroxymethylfurfural: origin of site-selectivity resolved by machine learning based reaction sampling. J. Am. Chem. Soc. 141, 20525–20536 (2019).
Article CAS PubMed Google Scholar
Grambow, C. A., Pattanaik, L. & Green, W. H. Deep learning of activation energies. J. Phys. Chem. Lett. 11, 2992–2997 (2020).
Article CAS PubMed PubMed Central Google Scholar
Grambow, C. A., Pattanaik, L. & Green, W. H. Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry. Sci. Data 7, 137 (2020).
Article CAS PubMed PubMed Central Google Scholar
Friederich, P., dos Passos Gomes, G., De Bin, R., Aspuru-Guzik, A. & Balcells, D. Machine learning dihydrogen activation in the chemical space surrounding Vaska’s complex. Chem. Sci. 11, 4584–4601 (2020).
Article CAS PubMed PubMed Central Google Scholar
Mulliner, D., Wondrousch, D. & Schuurmann, G. Predicting Michael-acceptor reactivity and toxicity through quantum chemical transition-state calculations. Org. Biomol. Chem. 9, 8400–8412 (2011).
Article CAS PubMed Google Scholar
Palazzesi, F. et al. Bireactive: a machine-learning model to estimate covalent warhead reactivity. J. Chem. Inf. Model. 60, 2915–2923 (2020).
Article CAS PubMed Google Scholar
Mortelmans, K. & Zeiger, E. The Ames Salmonella/microsome mutagenicity assay. Mutat. Res. 455, 29–60 (2000).
Article CAS PubMed Google Scholar
Kuhnke, L., Ter Laak, A. & Goller, A. H. Mechanistic reactivity descriptors for the prediction of Ames mutagenicity of primary aromatic amines. J. Chem. Inf. Model. 59, 668–672 (2019).
Article CAS PubMed Google Scholar
Finkelmann, A. R., Goller, A. H. & Schneider, G. Site of metabolism prediction based on ab initio derived atom representations. ChemMedChem 12, 606–612 (2017).
Article CAS PubMed Google Scholar
Rydberg, P., Gloriam, D. E., Zaretzki, J., Breneman, C. & Olsen, L. SMARTCyp: a 2D method for prediction of cytochrome P450-mediated drug metabolism. ACS Med. Chem. Lett. 1, 96–100 (2010).
Article CAS PubMed PubMed Central Google Scholar
Rydberg, P., Rostkowski, M., Gloriam, D. E. & Olsen, L. The contribution of atom accessibility to site of metabolism models for cytochromes P450. Mol. Pharm. 10, 1216–1223 (2013).
Article CAS PubMed Google Scholar
Olsen, L., Montefiori, M., Tran, K. P. & Jørgensen, F. S. SMARTCyp 3.0: enhanced cytochrome P450 site-of-metabolism prediction server. Bioinformatics 35, 3174–3175 (2019).
Article CAS PubMed Google Scholar
Tomberg, A., Johansson, M. J. & Norrby, P.-O. A predictive tool for electrophilic aromatic substitutions using machine learning. J. Org. Chem. 84, 4695–4703 (2019).
Article CAS PubMed Google Scholar
Li, X., Zhang, S. Q., Xu, L. C. & Hong, X. Predicting regioselectivity in radical C–H functionalization of heterocycles through machine learning. Angew. Chem. Int. Ed. 59, 13253–13259 (2020).
Article CAS Google Scholar
De, S., Bartók, A. P., Csányi, G. & Ceriotti, M. Comparing molecules and solids across structural and alchemical space. Phys. Chem. Chem. Phys. 18, 13754–13769 (2016).
Article CAS PubMed Google Scholar
Beker, W., Gajewska, E. P., Badowski, T. & Grzybowski, B. A. Prediction of major regio-, site-, and diastereoisomers in Diels–Alder reactions by using machine-learning: the importance of physically meaningful descriptors. Angew. Chem. Int. Ed. 58, 4515–4519 (2019).
Article CAS Google Scholar
Skoraczyński, G. et al. Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? Sci. Rep. 7, 3582 (2017).
Article PubMed PubMed Central Google Scholar
Muratov, E. N. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sigman, M. S., Harper, K. C., Bess, E. N. & Milo, A. The development of multidimensional analysis tools for asymmetric catalysis and beyond. Acc. Chem. Res. 49, 1292–1301 (2016).
Article CAS PubMed Google Scholar
Woods, B. P., Orlandi, M., Huang, C.-Y., Sigman, M. S. & Doyle, A. G. Nickel-catalyzed enantioselective reductive cross-coupling of styrenyl aziridines. J. Am. Chem. Soc. 139, 5688–5691 (2017).
Article CAS PubMed Google Scholar
Hwang, Y., Jung, H., Lee, E., Kim, D. & Chang, S. Quantitative analysis on two-point ligand modulation of iridium catalysts for chemodivergent C–H amidation. J. Am. Chem. Soc. 142, 8880–8889 (2020).
Article PubMed Google Scholar
Ferreira, M. A. B. et al. Noncovalent interactions drive the efficiency of molybdenum imido alkylidene catalysts for olefin metathesis. J. Am. Chem. Soc. 141, 10788–10800 (2019).
Article CAS PubMed Google Scholar
Verloop, A., Hoogenstraaten, W. & Tipker, J. in Drug Design Vol. 11 (ed. Ariëns, E. J.) 165–207 (Academic, 1976).
Santiago, C. B., Guo, J. Y. & Sigman, M. S. Predictive and mechanistic multivariate linear regression models for reaction development. Chem. Sci. 9, 2398–2412 (2018).
Article CAS PubMed PubMed Central Google Scholar
Durand, D. J. & Fey, N. Computational ligand descriptors for catalyst design. Chem. Rev. 119, 6561–6594 (2019).
Article CAS PubMed Google Scholar
Ravasco, J. M. J. M. & Coelho, J. A. S. Predictive multivariate models for bioorthogonal inverse-electron demand Diels–Alder reactions. J. Am. Chem. Soc. 142, 4235–4241 (2020).
Article CAS PubMed Google Scholar
Reid, J. P., Proctor, R. S. J., Sigman, M. S. & Phipps, R. J. Predictive multivariate linear regression analysis guides successful catalytic enantioselective Minisci reactions of diazines. J. Am. Chem. Soc. 141, 19178–19185 (2019).
Article CAS PubMed PubMed Central Google Scholar
Reid, J. P. & Sigman, M. S. Holistic prediction of enantioselectivity in asymmetric catalysis. Nature 571, 343–348 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).
Article CAS PubMed Google Scholar
Chuang, K. V. & Keiser, M. J. Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”. Science 362, eaat8603 (2018).
Article PubMed Google Scholar
Estrada, J. G., Ahneman, D. T., Sheridan, R. P., Dreher, S. D. & Doyle, A. G. Response to Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”. Science 362, eaat8763 (2018).
Article PubMed Google Scholar
Mayr, H. & Patz, M. Scales of nucleophilicity and electrophilicity: a system for ordering polar organic and organometallic reactions. Angew. Chem. Int. Ed. Engl. 33, 938–957 (1994).
Article Google Scholar
Hoffmann, G. et al. Predicting experimental electrophilicities from quantum and topological descriptors: a machine learning approach. J. Comput. Chem. 41, 2124–2136 (2020).
Article CAS Google Scholar
St. John, P. C., Guan, Y., Kim, Y., Kim, S. & Paton, R. S. Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost. Nat. Commun. 11, 2328 (2020).
Article Google Scholar
St John, P. C. et al. Quantum chemical calculations for over 200,000 organic radical species and 40,000 associated closed-shell molecules. Sci. Data 7, 244 (2020).
Article Google Scholar
Guan, Y. et al. Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors. Chem. Sci. 12, 2198–2208 (2021).
Article CAS Google Scholar
Zahrt, A. F. et al. Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning. Science 363, eaau5631 (2019). A recent example of selectivity prediction with results close to experiment.
Article CAS PubMed PubMed Central Google Scholar
Schneider, N., Lowe, D. M., Sayle, R. A. & Landrum, G. A. Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity. J. Chem. Inf. Model. 55, 39–53 (2015).
Article CAS PubMed Google Scholar
Ghiandoni, G. M. et al. Development and application of a data-driven reaction classification model: comparison of an electronic lab notebook and medicinal chemistry literature. J. Chem. Inf. Model. 59, 4167–4187 (2019).
Article CAS PubMed Google Scholar
Patel, H., Bodkin, M. J., Chen, B. & Gillet, V. J. Knowledge-based approach to de novo design using reaction vectors. J. Chem. Inf. Model. 49, 1163–1184 (2009).
Article CAS PubMed Google Scholar
Sandfort, F., Strieth-Kalthoff, F., Kühnemund, M., Beecks, C. & Glorius, F. A structure-based platform for predicting chemical reactivity. Chem 6, 1379–1390 (2020).
Article CAS Google Scholar
Duvenaud, D. K. et al. in Advances in Neural Information Processing Systems 28 (eds Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R.) 2224–2232 (Curran Associates, 2015).
Wei, J. N., Duvenaud, D. & Aspuru-Guzik, A. Neural networks for the prediction of organic chemistry reactions. ACS Cent. Sci. 2, 725–732 (2016).
Article CAS PubMed PubMed Central Google Scholar
Schwaller, P. et al. Mapping the space of chemical reactions using attention-based neural networks. Nat. Mach. Intell. 3, 144–152 (2021).
Article Google Scholar
Schwaller, P., Vaucher, A. C., Laino, T. & Reymond, J.-L. Prediction of chemical reaction yields using deep learning. Preprint at https://doi.org/10.26434/chemrxiv.12758474.v1 (2020).
Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).
Article CAS PubMed PubMed Central Google Scholar
Varnek, A., Fourches, D., Hoonakker, F. & Solov’ev, V. P. Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures. J. Comput. Aided Mol. Des. 19, 693–703 (2005). This work introduced the CGR–ISIDA approach used for the reactions and conditions prediction, clustering, similarity searching etc.
Article CAS PubMed Google Scholar
Fujita, S. Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts. J. Chem. Inf. Model. 26, 205–212 (1986).
CAS Google Scholar
Körner, R. & Apostolakis, J. Automatic determination of reaction mappings and reaction center information. 1. The imaginary transition state energy approach. J. Chem. Inf. Model. 48, 1181–1189 (2008).
Article PubMed Google Scholar
Glavatskikh, M. et al. Predictive models for kinetic parameters of cycloaddition reactions. Mol. Inform. 38, 1800077 (2019).
Article CAS Google Scholar
Madzhidov, T. I. et al. Structure–reactivity relationship in bimolecular elimination reactions based on the condensed graph of a reaction. J. Struct. Chem. 56, 1227–1234 (2016).
Article Google Scholar
Gimadiev, T. et al. Bimolecular nucleophilic substitution reactions: predictive models for rate constants and molecular reaction pairs analysis. Mol. Inform. 38, 1800104 (2019).
Article Google Scholar
Marcou, G. et al. Expert system for predicting reaction conditions: the Michael reaction case. J. Chem. Inf. Model. 55, 239–250 (2015).
Article CAS PubMed Google Scholar
Lin, A. I. et al. Automatized assessment of protective group reactivity: a step toward big reaction data analysis. J. Chem. Inf. Model. 56, 2140–2148 (2016).
Article CAS PubMed Google Scholar
Nugmanov, R. I. et al. CGRtools: python library for molecule, reaction, and condensed graph of reaction processing. J. Chem. Inf. Model. 59, 2516–2521 (2019).
Article CAS PubMed Google Scholar
Fialkowski, M., Bishop, K. J. M., Chubukov, V. A., Campbell, C. J. & Grzybowski, B. A. Architecture and evolution of organic chemistry. Angew. Chem. Int. Ed. 44, 7263–7269 (2005).
Article CAS Google Scholar
Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. 55, 5904–5937 (2016).
Article Google Scholar
Klucznik, T. et al. Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4, 522–532 (2018).
Article CAS Google Scholar
Tiano, K. Merck acquires Grzybowski scientific inventions to expand chemical synthesis offering. Merck https://www.merckmillipore.com/SE/en/20170505_202234 (2017).
Plehiers, P. P., Marin, G. B., Stevens, C. V. & Van Geem, K. M. Automated reaction database and reaction network analysis: extraction of reaction templates using cheminformatics. J. Cheminformatics 10, 11 (2018).
Article Google Scholar
Krallinger, M., Rabal, O., Lourenço, A., Oyarzabal, J. & Valencia, A. Information retrieval and text mining technologies for chemistry. Chem. Rev. 117, 7673–7761 (2017).
Article CAS PubMed Google Scholar
Warr, W. A. A short review of chemical reaction database systems, computer-aided synthesis design, reaction prediction and synthetic feasibility. Mol. Inform. 33, 469–476 (2014).
Article CAS PubMed Google Scholar
Lowe, D. M. Extraction of Chemical Structures and Reactions from the Literature. Doctor of Philosophy (PhD) thesis, Univ. Cambridge (2012).
Zhang, Q.-Y. & Aires-de-Sousa, J. Structure-based classification of chemical reactions without assignment of reaction centers. J. Chem. Inf. Model. 45, 1775–1783 (2005).
Article CAS PubMed Google Scholar
Carrera, G. V. S. M., Gupta, S. & Aires-de-Sousa, J. Machine learning of chemical reactivity from databases of organic reactions. J. Comput. Mol. Des. 23, 419–429 (2009).
Article CAS Google Scholar
Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J. 23, 5966–5971 (2017).
Article CAS PubMed Google Scholar
Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. Prediction of organic reaction outcomes using machine learning. ACS Cent. Sci. 3, 434–443 (2017).
Article CAS PubMed PubMed Central Google Scholar
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018). This work introduced a fully data-driven neural network for general reactivity prediction.
Article CAS PubMed Google Scholar
Schneider, N., Stiefl, N. & Landrum, G. A. What’s what: the (nearly) definitive guide to reaction role assignment. J. Chem. Inf. Model. 56, 2336–2346 (2016).
Article CAS PubMed Google Scholar
Jaworski, W. et al. Automatic mapping of atoms across both simple and complex chemical reactions. Nat. Commun. 10, 1434 (2019).
Article PubMed PubMed Central Google Scholar
Schwaller, P., Hoover, B., Reymond, J.-L., Strobelt, H. & Laino, T. Unsupervised attention-guided atom-mapping. Preprint at https://doi.org/10.26434/chemrxiv.12298559.v1 (2020).
Kayala, M. A., Azencott, C.-A., Chen, J. H. & Baldi, P. Learning to predict chemical reactions. J. Chem. Inf. Model. 51, 2209–2222 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kayala, M. A. & Baldi, P. ReactionPredictor: prediction of complex chemical reactions at the mechanistic level using machine learning. J. Chem. Inf. Model. 52, 2526–2540 (2012).
Article CAS PubMed Google Scholar
Fooshee, D. et al. Deep learning for chemical reaction prediction. Mol. Syst. Des. Eng. 3, 442–452 (2018).
Article CAS Google Scholar
Sadowski, P., Fooshee, D., Subrahmanya, N. & Baldi, P. Synergies between quantum mechanics and machine learning in reaction prediction. J. Chem. Inf. Model. 56, 2125–2128 (2016).
Article CAS PubMed Google Scholar
Fujinami, M., Seino, J. & Nakai, H. Quantum chemical reaction prediction method based on machine learning. Bull. Chem. Soc. Jpn. 93, 685–693 (2020).
Article CAS Google Scholar
Jin, W. C., Connor W., Barzilay, R. & Jaakkola, T. in Neural Information Processing Systems (eds Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S. & Garnett, R.) 2607–2616 (Curran Associates, 2017).
Coley, C. W. et al. A graph-convolutional neural network model for the prediction of chemical reactivity. Chem. Sci. 10, 370–377 (2019).
Article CAS PubMed Google Scholar
Schwaller, P. & Laino, T. in Machine Learning in Chemistry: Data-Driven Algorithms, Learning Systems, and Predictions Vol. 1326 61–79 (American Chemical Society, 2019).
Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017).
Article CAS PubMed PubMed Central Google Scholar
Schwaller, P., Gaudin, T., Lanyi, D., Bekas, C. & Laino, T. “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schwaller, P. et al. Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent. Sci. 5, 1572–1583 (2019). In this work, natural language processing methods were successfully used for general reaction prediction.
Article CAS PubMed PubMed Central Google Scholar
Alammar, J. The Illustrated Transformer. J. Alammar http://jalammar.github.io/illustrated-transformer/ (2018).
Walker, E. et al. Learning to predict reaction conditions: relationships between solvent, molecular structure, and catalyst. J. Chem. Inf. Model. 59, 3645–3654 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).
Article CAS PubMed PubMed Central Google Scholar
Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).
Article CAS PubMed Google Scholar
Segler, M. H. S. & Waller, M. P. Modelling chemical reasoning to predict and invent reactions. Chem. Eur. J. 23, 6118–6128 (2017).
Article CAS PubMed Google Scholar
Gromski, P. S., Henson, A. B., Granda, J. M. & Cronin, L. How to explore chemical space using algorithms and automation. Nat. Rev. Chem. 3, 119–128 (2019).
Article Google Scholar
Wang, Z., Zhao, W., Hao, G. & Song, B. Automated synthesis: current platforms and further needs. Drug Discov. Today 25, 2006–2011 (2020).
Article CAS Google Scholar
Nesterov, V., Wieser, M. & Roth, V. J. 3DMolNet: a generative network for molecular structures. Preprint at https://arxiv.org/abs/2010.06477 (2020).
Pattanaik, L., Ingraham, J. B., Grambow, C. A. & Green, W. H. Generating transition states of isomerization reactions with deep learning. Phys. Chem. Chem. Phys. 22, 23618–23626 (2020).
Article CAS PubMed Google Scholar
Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 134 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kammeraad, J. A., Goetz, J., Walker, E. A., Tewari, A. & Zimmerman, P. M. What does the machine learn? Knowledge representations of chemical reactivity. J. Chem. Inf. Model. 60, 1290–1301 (2020).
Article CAS PubMed PubMed Central Google Scholar
Herges, R. & Hoock, C. Reaction planning: computer-aided discovery of a novel elimination reaction. Science 255, 711–713 (1992).
Article CAS PubMed Google Scholar
William, B. et al. Discovery of novel chemical reactions by deep generative recurrent neural network. Sci. Rep. 11, 3178 (2021).
Article Google Scholar
Unsleber, J. P. & Reiher, M. The exploration of chemical reaction networks. Annu. Rev. Phys. Chem. 71, 121–142 (2020).
Article CAS PubMed Google Scholar
Sameera, W. M. C., Maeda, S. & Morokuma, K. Computational catalysis using the artificial force induced reaction method. Acc. Chem. Res. 49, 763–773 (2016).
Article CAS PubMed Google Scholar
Martínez, T. J. Ab initio reactive computer aided molecular design. Acc. Chem. Res. 50, 652–656 (2017).
Article PubMed Google Scholar
Rappoport, D., Galvin, C. J., Zubarev, D. Y. & Aspuru-Guzik, A. Complex chemical reaction networks from heuristics-aided quantum chemistry. J. Chem. Theory Comput. 10, 897–907 (2014).
Article CAS PubMed Google Scholar
Bergeler, M., Simm, G. N., Proppe, J. & Reiher, M. Heuristics-guided exploration of reaction mechanisms. J. Chem. Theory Comput. 11, 5712–5722 (2015).
Article CAS PubMed Google Scholar
Smith, D. G. A. et al. The MolSSI QCArchive project: an open-source platform to compute, organize, and share quantum chemistry data. Wiley Interdiscip. Rev. Comput. Mol. Sci. 11, e1491 (2020).
Google Scholar
Álvarez-Moreno, M. et al. Managing the computational chemistry big data problem: the ioChem-BD platform. J. Chem. Inf. Model. 55, 95–103 (2014).
Article PubMed Google Scholar
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Article CAS PubMed Google Scholar
Jaeger, S., Fulle, S. & Turk, S. Mol2vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 58, 27–35 (2018).
Article CAS PubMed Google Scholar
Feinberg, E. N. et al. PotentialNet for molecular property prediction. ACS Cent. Sci. 4, 1520–1530 (2018).
Article CAS PubMed PubMed Central Google Scholar
Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. Convolutional embedding of attributed molecular graphs for physical property prediction. J. Chem. Inf. Model. 57, 1757–1772 (2017).
Article CAS PubMed Google Scholar
Korolev, V., Mitrofanov, A., Korotcov, A. & Tkachenko, V. Graph convolutional neural networks as “general-purpose” property predictors: the universality and limits of applicability. J. Chem. Inf. Model. 60, 22–28 (2020).
Article CAS PubMed Google Scholar
Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Mol. Des. 30, 595–608 (2016).
Article CAS Google Scholar
Cawley, G. C. & Talbot, N. L. C. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010).
Google Scholar
Varma, S. & Simon, R. Bias in error estimation when using cross-validation for model selection. BMC Bioinforma. 7, 91 (2006).
Article Google Scholar
Hanser, T., Barber, C., Marchaland, J. F. & Werner, S. Applicability domain: towards a more formal definition. SAR QSAR Environ. Res. 27, 865–881 (2016).
Article CAS Google Scholar
Abu-Mostafa, Y. S., Magdon-Ismail, M. & Lin, H. T. Learning from Data: A Short Course (AMLBook.com, 2012).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction 2nd edn (Springer, 2009).
Harrell, F. E. Regression Modeling Strategies: with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis 2nd edn (Springer, 2015).
James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning: with Applications in R (Springer, 2013).

Download references

Acknowledgements

K.J. is a fellow of the AstraZeneca Postdoc Programme.

Author information

Authors and Affiliations

Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca, Macclesfield, UK
Kjell Jorner
Research and Early Development, Cardiovascular, Renal and Metabolism, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
Anna Tomberg
Data Science and Modelling, Pharmaceutical Sciences, R&D, AstraZeneca, Gothenburg, Sweden
Christoph Bauer & Per-Ola Norrby
Drug Design and Discovery, Department of Medicinal Chemistry, Uppsala University, Uppsala, Sweden
Christian Sköld

Authors

Kjell Jorner
View author publications
You can also search for this author in PubMed Google Scholar
Anna Tomberg
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Bauer
View author publications
You can also search for this author in PubMed Google Scholar
Christian Sköld
View author publications
You can also search for this author in PubMed Google Scholar
Per-Ola Norrby
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to the preparation of this manuscript.

Corresponding author

Correspondence to Per-Ola Norrby.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Chemistry thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Glossary

Density functional theory: (DFT). A quantum-mechanical method based on electron density for simulating molecules and reactions.
Descriptors: Also referred to as features. The properties used to train a machine learning model.
Semiempirical QM methods: Use the same algorithms as wave function and density functional theory methods, but approximated values for matrix elements.
Domains of applicability: The regions of chemical space within which a model can reliably make predictions.
Gaussian process regression: Machine learning algorithm in which the data points are assumed to be the means of Gaussian distributions. Delivers both predicted means and variance.
Extra tree regressor model: Machine learning algorithm similar to random forest. Owing to differences in implementation, this method is usually faster than a random forest.
Random forest: Machine learning algorithm that builds an ensemble of decision trees and predicts the value of a new example by taking into consideration the prediction from each decision tree in the ensemble.
Sterimol parameters: A set of parameters that describes the steric effects of substituents.
Gradient boosting decision tree model: Machine learning algorithm that is based on decision trees (see ‘random forest’). The model is built stepwise, conjoined with the introduction of a learning rate. This approach has been shown to avoid overfitting problems.
Receiver operator characteristic: (ROC). Curve of true positive rate versus the false positive rate of a machine learning classification algorithm. The area under the ROC curve is often used as a performance metric.
Support vector machine: (SVM). A machine learning algorithm based on the idea that data points are divided by a hyperplane. The model tries to define the form of the hyperplane so as to maximize the separation between dissimilar data points.
Deep feed-forward neural network models: A feed-forward neural network, also called a multilayer perceptron, is one of the basic architectures in machine learning, in which the input nodes connect to hidden layers of nodes, which, in turn, connect to the output nodes. A neural network is feed-forward when no output information is channelled back into the model, as opposed to recurrent networks.
Molecular fingerprints: Molecular representations derived from the molecular connectivity.
Representations: Machine-readable descriptions of a molecule as, for example, a string of characters, a vector or a graph.
Atom mapping: Refers to the labelling of atoms in the reactants and the corresponding atoms in the products in a reaction SMARTS.
Deep learning: The field of machine learning that uses neural networks with many hidden layers.
Templates: Patterns describing a chemical reaction, often represented by reaction SMARTS.
SMARTS: A string representation of a molecular pattern, based on the simplified molecular input line entry system (SMILES). SMARTS are used to define a substructure of a molecule. For example, ethanol could be represented using the SMILES string CCO. To define the alcohol functional group, one uses SMARTS [#6][OX2H], in which each atomic position is enclosed in square brackets and encodes which atom types are allowed at this position.
Negative reactions: Reactions that give a low or zero yield. These are important for machine learning because the model needs to learn that not all input leads to a product.
Graph convolutional networks: Neural networks that operate on a graph and use convolution to create their own features for learning.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jorner, K., Tomberg, A., Bauer, C. et al. Organic reactivity from mechanism to machine learning. Nat Rev Chem 5, 240–255 (2021). https://doi.org/10.1038/s41570-021-00260-x

Download citation

Accepted: 10 February 2021
Published: 16 March 2021
Issue Date: April 2021
DOI: https://doi.org/10.1038/s41570-021-00260-x

This article is cited by

Directional multiobjective optimization of metal complexes at the billion-system scale
- Hannes Kneiding
- Ainara Nova
- David Balcells
Nature Computational Science (2024)
A physical organic strategy to predict and interpret stabilities of chemical bonds in energetic compounds for the discovery of thermal-resistant properties
- Haitao Liu
- Peng Chen
- Xianfeng Wei
Journal of Molecular Modeling (2024)
A neural network model informs the total synthesis of clovane sesquiterpenoids
- Pengpeng Zhang
- Jungmin Eun
- Timothy R. Newhouse
Nature Synthesis (2023)
Element selection for functional materials discovery by integrated machine learning of elemental contributions to properties
- Andrij Vasylenko
- Dmytro Antypov
- Matthew J. Rosseinsky
npj Computational Materials (2023)
Structural design of organic battery electrode materials: from DFT to artificial intelligence
- Ting-Ting Wu
- Gao-Le Dai
- Yu-Min Qian
Rare Metals (2023)

Organic reactivity from mechanism to machine learning

Subjects

Abstract

Access options

Similar content being viewed by others

Machine learning in chemical reaction space

Organic reaction mechanism classification using machine learning

Chemical reaction networks and opportunities for machine learning

Change history

22 March 2021

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Peer review information

Publisher’s note

Related links

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

Directional multiobjective optimization of metal complexes at the billion-system scale

A physical organic strategy to predict and interpret stabilities of chemical bonds in energetic compounds for the discovery of thermal-resistant properties

A neural network model informs the total synthesis of clovane sesquiterpenoids

Element selection for functional materials discovery by integrated machine learning of elemental contributions to properties

Structural design of organic battery electrode materials: from DFT to artificial intelligence

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Change history

22 March 2021

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Peer review information

Publisher’s note

Related links

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links