Trends in Chemistry
Volume 3, Issue 2, February 2021, Pages 96-110
Journal home page for Trends in Chemistry

Opinion
Special Issue: Machine Learning for Molecules and Materials
Navigating through the Maze of Homogeneous Catalyst Design with Machine Learning

https://doi.org/10.1016/j.trechm.2020.12.006Get rights and content

Highlights

  • Microkinetic models of homogeneous catalytic reactions are constructed using ab initio simulations to gain insight into mechanisms, rationalize catalyst performance, and inspire catalyst design.

  • Using a data-driven approach, the classical linear free-energy relationships are extended to multiple linear regression to study both reactivity and selectivity of homogeneous catalysts and build models to rationalize important interactions.

  • Volcano plots are demonstrated as a general analysis framework when comparing potential energy surfaces of related catalytic reactions to extract decisive parameters and visualize breaks in mechanisms.

  • Nonlinear regression models including random forests, support-vector machines, Gaussian processes, and artificial neural networks are applied to predict reaction yields, enantioselectivity, and activation energies of catalytic reactions based on both experimental and computational data.

The ability to forge difficult chemical bonds through catalysis has transformed society on all fronts, from feeding the ever-growing population to increasing life expectancies through the synthesis of new drugs. However, developing new chemical reactions and catalytic systems is a tedious task that requires tremendous discovery and optimization efforts. Over the past decade, advances in machine learning (ML) have revolutionized a whole new way to approach data-intensive problems, and many of these developments have started to enter chemistry. Meanwhile, similar advances in the field of homogeneous catalysis are in only their infancy. In this perspective, we outline our vision for the future of homogeneous catalyst design and the role of ML in navigating this maze.

Section snippets

Societal Importance and Challenges

About 85% of all industrial chemistry processes are catalytic [1]. Roughly 25% of the global human energy consumption is used for producing chemicals [1] and the chemical industry accounts for approximately 7% of the global anthropogenic greenhouse gas emissions [2]. To limit the global mean temperature rise to 2°C above preindustrial levels, a total reduction of absolute CO2 emissions in the chemical industry of 30% by 2050 is necessary, despite a projected increase in demand of 180% for the

Basic Concepts and Status Quo

In recent years, computational chemistry experienced a tremendous surge due to increasing computational power and the accompanying heightened practicality of simulating ever-larger ensembles of atoms. Accordingly, these significant advances shifted the focus from developing methods simulating matter and benchmarking the results against experiments, to predicting the properties of unknown materials and defining new targets for experimental verification. Similarly, computer-aided catalyst design

The Role of Representations and Catalyst Informatics

One cornerstone of ML is the representation of data, which is used to train models. In the automated DMTA cycle, representations play a major role in design as they facilitate learning and determine both inputs and outputs of models. Most commonplace in chemistry are descriptors, typically 1D real vectors providing information about a given (sub)structure, a classical example being the Hammett substituent constants [30]. In their early days, descriptors were derived largely from experiments,

Data Swamps and Data Lakes

Another cornerstone of ML is data, which are generated throughout the entire DMTA cycle and used by ML models for training to make decisions. Large data collections are stored in databases, which are commonplace in chemistry and can be used as starting points for catalyst design campaigns. Among the most important chemistry databases are structure and reaction databases [53], the Cambridge Structural Database (CSD) for crystal structures [54], the Protein Data Bank (PBD) for protein structures [

Robust Synthesis and Data-Driven Experimentation

Next, we look into the experimental side of the automated DMTA cycle. By far, the bottleneck of catalyst design is usually synthesis (i.e., the make node). Hence, catalyst synthesis needs to be rethought from the ground up for efficient closed-loop optimizations [66]. Additionally, it is important to differentiate between derivatizing established scaffolds, for instance, making new phosphines for better cross-couplings, and synthesizing entirely new structures. The former can likely be achieved

Computer-Driven Design and Workflow Orchestration

The final piece in the DMTA cycle is computer-driven design. The design of molecules can be automated using generative models [83]. They are useful for homogeneous catalysis whenever new catalysts are sought. This can be as complicated as exploring transition-metal complexes catalyzing a new transformation but can also be as straightforward as finding phosphine ligands for efficient cross-couplings. In the context of deep learning, deep generative models have entered the public discussion via

Explaining Black Boxes as a New Paradigm

At present, progress in ML comes at an astonishing pace, but it takes time for the most recent developments to enter other fields. One of the outstanding challenges is explainable artificial intelligence (AI), also termed the interpretability problem [105]. The current black-box nature of many ML approaches is unsatisfying, as Eugene Wigner said [106]: ‘It is nice to know that the computer understands the problem. But I would like to understand it too.’ Accordingly, the importance of

Concluding Remarks

The path toward autonomous catalyst discovery is far from linear, as many design and implementation choices remain to be decided (see Outstanding Questions). Accordingly, we foreshadow the future of homogeneous catalyst design as a maze (Figure 5) with autonomous closed-loop discovery as the ultimate dream. We incorporated what we envision to be important milestones as forks along the path. However, while Figure 5 depicts only one path toward the center of the maze, as the proverb goes, ‘All

Acknowledgments

G.d.P.G gratefully acknowledges the Natural Sciences and Engineering Research Council of Canada (NSERC) for the Banting Postdoctoral Fellowship. We acknowledge the Defense Advanced Research Projects Agency (DARPA) under the Accelerated Molecular Discovery Program under Cooperative Agreement No. HR00111920027 dated August 1, 2019. The content of the information presented in this work does not necessarily reflect the position or the policy of the Government. A.A-G. thanks Anders G. Frøseth for

Disclaimer Statement

A.A-G. is co-founder and Chief Visionary Officer of Kebotix, Inc.

Glossary

Artificial neural network
network of connected artificial neurons used for computation. The input of artificial neurons is processed with specific mathematical operations and the output is transmitted to all connected artificial neurons. The input of the network is propagated through the entire network to provide the final output of the computation.
Closed-loop optimization
optimization strategy combining both computational and experimental techniques and data to infer subsequent evaluations and

References (117)

  • S.A. Weissman et al.

    Design of experiments (DoE) and process optimization. A review of recent publications

    Org. Process. Res. Dev.

    (2015)
  • F. Häse

    Chimera: enabling hierarchy based multi-objective optimization for self-driving laboratories

    Chem. Sci.

    (2018)
  • J.M. Thomas

    Summarizing comments on the discussion and a prospectus for urgent future action

    Philos. Trans. R. Soc. Lond. A

    (2016)
  • P.G. Levi et al.

    Mapping global flows of chemicals: from fossil fuel feedstocks to chemical products

    Environ. Sci. Technol.

    (2018)
  • S. Bhaduri et al.

    Chemical industry and homogeneous catalysis

  • National Research Council

    Impact of Advances in Computing and Communications Technologies on Chemical Science and Technology: Report of a Workshop

    (1999)
  • B. Sanchez-Lengeling et al.

    Inverse molecular design using machine learning: generative models for matter engineering

    Science

    (2018)
  • J. Hagen

    Homogeneously catalyzed industrial processes

  • R. Franke

    Applied hydroformylation

    Chem. Rev.

    (2012)
  • J.A. Keith et al.

    The mechanism of the Wacker reaction: a tale of two hydroxypalladations

    Angew. Chem. Int. Ed.

    (2009)
  • Norio Miyaura et al.

    Palladium-catalyzed cross-coupling reactions of organoboron compounds

    Chem. Rev.

    (1995)
  • K.N. Houk et al.

    Computational prediction of small-molecule catalysts

    Nature

    (2008)
  • L.C. Burrows

    Computationally guided catalyst design in the type I dynamic kinetic asymmetric Pauson–Khand reaction of allenyl acetates

    J. Am. Chem. Soc.

    (2017)
  • R.N. Straker

    Computational ligand design in enantio- and diastereoselective ynamide [5+2] cycloisomerization

    Nat. Commun.

    (2016)
  • Y. Guan

    AARON: an automated reaction optimizer for new catalysts

    J. Chem. Theory Comput.

    (2018)
  • M. Foscato et al.

    Automated in silico design of homogeneous catalysts

    ACS Catal.

    (2020)
  • Y. Wang

    A computationally designed Rh(I)-catalyzed two-component [5+2+1] cycloaddition of ene-vinylcyclopropanes and CO for the synthesis of cyclooctenones

    J. Am. Chem. Soc.

    (2007)
  • M.C. Nielsen

    Computational ligand design for the reductive elimination of ArCF3 from a small bite angle PdII complex: remarkable effect of a perfluoroalkyl phosphine

    Angew. Chem. Int. Ed.

    (2014)
  • J.P. Reid et al.

    Comparing quantitative prediction methods for the discovery of small-molecule chiral catalysts

    Nat. Rev. Chem.

    (2018)
  • F. Strieth-Kalthoff

    Machine learning the ropes: principles, applications and directions in synthetic chemistry

    Chem. Soc. Rev.

    (2020)
  • A. Milo

    Interrogating selectivity in catalysis using molecular vibrations

    Nature

    (2014)
  • M. Orlandi

    Parametrization of non-covalent interactions for transition state interrogation applied to asymmetric catalysis

    J. Am. Chem. Soc.

    (2017)
  • X.Y. See

    Iterative supervised principal component analysis driven ligand design for regioselective Ti-catalyzed pyrrole synthesis

    ACS Catal.

    (2020)
  • D.T. Ahneman

    Predicting reaction performance in C–N cross-coupling using machine learning

    Science

    (2018)
  • A.F. Zahrt

    Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning

    Science

    (2019)
  • A.R. Rosales

    Rapid virtual screening of enantioselective catalysts using CatVS

    Nat. Cat.

    (2019)
  • K. Jorner

    Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies

    Chem. Sci.

    (2020)
  • P. Sabatier

    Hydrogénations et déshydrogénations par catalyse

    Ber. Dtsch. Chem. Ges.

    (1911)
  • M. Busch

    A generalized picture of C–C cross-coupling

    ACS Catal.

    (2017)
  • M.D. Wodrich

    Activity-based screening of homogeneous catalysts through the rapid assessment of theoretically derived turnover frequencies

    ACS Catal.

    (2019)
  • L.P. Hammett

    The effect of structure upon the reactions of organic compounds. Benzene derivatives

    J. Am. Chem. Soc.

    (1937)
  • R. Todeschini et al.

    Handbook of Molecular Descriptors

    (2000)
  • J.P. Janet et al.

    Predicting electronic structure properties of transition metal complexes with neural networks

    Chem. Sci.

    (2017)
  • D.K. Duvenaud

    Convolutional networks on graphs for learning molecular fingerprints

    Adv. Neural Inf. Proces. Syst.

    (2015)
  • J. Gilmer

    Neural message passing for quantum chemistry

  • T.F.G.G. Cova et al.

    Deep learning for deep chemistry: optimizing the prediction of chemical patterns

    Front. Chem.

    (2019)
  • D. Weininger

    SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules

    J. Chem. Inf. Comput. Sci.

    (1988)
  • P. Schwaller

    Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction

    ACS Cent. Sci.

    (2019)
  • M. Hirohara

    Convolutional neural network based on SMILES representation of compounds for detecting chemical motif

    BMC Bioinformatics

    (2018)
  • M. Krenn

    Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation

    Mach. Learn. Sci. Technol.

    (2020)
  • Cited by (0)

    5

    These authors contributed equally to this work

    View full text