Trends in Chemistry
OpinionSpecial Issue: Machine Learning for Molecules and MaterialsNavigating through the Maze of Homogeneous Catalyst Design with Machine Learning
Section snippets
Societal Importance and Challenges
About 85% of all industrial chemistry processes are catalytic [1]. Roughly 25% of the global human energy consumption is used for producing chemicals [1] and the chemical industry accounts for approximately 7% of the global anthropogenic greenhouse gas emissions [2]. To limit the global mean temperature rise to 2°C above preindustrial levels, a total reduction of absolute CO2 emissions in the chemical industry of 30% by 2050 is necessary, despite a projected increase in demand of 180% for the
Basic Concepts and Status Quo
In recent years, computational chemistry experienced a tremendous surge due to increasing computational power and the accompanying heightened practicality of simulating ever-larger ensembles of atoms. Accordingly, these significant advances shifted the focus from developing methods simulating matter and benchmarking the results against experiments, to predicting the properties of unknown materials and defining new targets for experimental verification. Similarly, computer-aided catalyst design
The Role of Representations and Catalyst Informatics
One cornerstone of ML is the representation of data, which is used to train models. In the automated DMTA cycle, representations play a major role in design as they facilitate learning and determine both inputs and outputs of models. Most commonplace in chemistry are descriptors, typically 1D real vectors providing information about a given (sub)structure, a classical example being the Hammett substituent constants [30]. In their early days, descriptors were derived largely from experiments,
Data Swamps and Data Lakes
Another cornerstone of ML is data, which are generated throughout the entire DMTA cycle and used by ML models for training to make decisions. Large data collections are stored in databases, which are commonplace in chemistry and can be used as starting points for catalyst design campaigns. Among the most important chemistry databases are structure and reaction databases [53], the Cambridge Structural Database (CSD) for crystal structures [54], the Protein Data Bank (PBD) for protein structures [
Robust Synthesis and Data-Driven Experimentation
Next, we look into the experimental side of the automated DMTA cycle. By far, the bottleneck of catalyst design is usually synthesis (i.e., the make node). Hence, catalyst synthesis needs to be rethought from the ground up for efficient closed-loop optimizations [66]. Additionally, it is important to differentiate between derivatizing established scaffolds, for instance, making new phosphines for better cross-couplings, and synthesizing entirely new structures. The former can likely be achieved
Computer-Driven Design and Workflow Orchestration
The final piece in the DMTA cycle is computer-driven design. The design of molecules can be automated using generative models [83]. They are useful for homogeneous catalysis whenever new catalysts are sought. This can be as complicated as exploring transition-metal complexes catalyzing a new transformation but can also be as straightforward as finding phosphine ligands for efficient cross-couplings. In the context of deep learning, deep generative models have entered the public discussion via
Explaining Black Boxes as a New Paradigm
At present, progress in ML comes at an astonishing pace, but it takes time for the most recent developments to enter other fields. One of the outstanding challenges is explainable artificial intelligence (AI), also termed the interpretability problem [105]. The current black-box nature of many ML approaches is unsatisfying, as Eugene Wigner said [106]: ‘It is nice to know that the computer understands the problem. But I would like to understand it too.’ Accordingly, the importance of
Concluding Remarks
The path toward autonomous catalyst discovery is far from linear, as many design and implementation choices remain to be decided (see Outstanding Questions). Accordingly, we foreshadow the future of homogeneous catalyst design as a maze (Figure 5) with autonomous closed-loop discovery as the ultimate dream. We incorporated what we envision to be important milestones as forks along the path. However, while Figure 5 depicts only one path toward the center of the maze, as the proverb goes, ‘All
Acknowledgments
G.d.P.G gratefully acknowledges the Natural Sciences and Engineering Research Council of Canada (NSERC) for the Banting Postdoctoral Fellowship. We acknowledge the Defense Advanced Research Projects Agency (DARPA) under the Accelerated Molecular Discovery Program under Cooperative Agreement No. HR00111920027 dated August 1, 2019. The content of the information presented in this work does not necessarily reflect the position or the policy of the Government. A.A-G. thanks Anders G. Frøseth for
Disclaimer Statement
A.A-G. is co-founder and Chief Visionary Officer of Kebotix, Inc.
Glossary
- Artificial neural network
- network of connected artificial neurons used for computation. The input of artificial neurons is processed with specific mathematical operations and the output is transmitted to all connected artificial neurons. The input of the network is propagated through the entire network to provide the final output of the computation.
- Closed-loop optimization
- optimization strategy combining both computational and experimental techniques and data to infer subsequent evaluations and
References (117)
Predictive and mechanistic multivariate linear regression models for reaction development
Chem. Sci.
(2018)A graph-convolutional neural network model for the prediction of chemical reactivity
Chem. Sci.
(2019)“Found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
Chem. Sci.
(2018)A structure-based platform for predicting chemical reactivity
Chem
(2020)Machine learning dihydrogen activation in the chemical space surrounding Vaska’s complex
Chem. Sci.
(2020)AFLOW: an automatic framework for high-throughput materials discovery
Comput. Mater. Sci.
(2012)Database for catalysis design
Catal. Today
(1991)Recent advances in high-throughput automated powder dispensing platforms for pharmaceutical applications
Org. Process. Res. Dev.
(2020)High throughput reaction screening using desorption electrospray ionization mass spectrometry
Chem. Sci.
(2018)A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space
Chem. Sci.
(2019)
Design of experiments (DoE) and process optimization. A review of recent publications
Org. Process. Res. Dev.
Chimera: enabling hierarchy based multi-objective optimization for self-driving laboratories
Chem. Sci.
Summarizing comments on the discussion and a prospectus for urgent future action
Philos. Trans. R. Soc. Lond. A
Mapping global flows of chemicals: from fossil fuel feedstocks to chemical products
Environ. Sci. Technol.
Chemical industry and homogeneous catalysis
Impact of Advances in Computing and Communications Technologies on Chemical Science and Technology: Report of a Workshop
Inverse molecular design using machine learning: generative models for matter engineering
Science
Homogeneously catalyzed industrial processes
Applied hydroformylation
Chem. Rev.
The mechanism of the Wacker reaction: a tale of two hydroxypalladations
Angew. Chem. Int. Ed.
Palladium-catalyzed cross-coupling reactions of organoboron compounds
Chem. Rev.
Computational prediction of small-molecule catalysts
Nature
Computationally guided catalyst design in the type I dynamic kinetic asymmetric Pauson–Khand reaction of allenyl acetates
J. Am. Chem. Soc.
Computational ligand design in enantio- and diastereoselective ynamide [5+2] cycloisomerization
Nat. Commun.
AARON: an automated reaction optimizer for new catalysts
J. Chem. Theory Comput.
Automated in silico design of homogeneous catalysts
ACS Catal.
A computationally designed Rh(I)-catalyzed two-component [5+2+1] cycloaddition of ene-vinylcyclopropanes and CO for the synthesis of cyclooctenones
J. Am. Chem. Soc.
Computational ligand design for the reductive elimination of ArCF3 from a small bite angle PdII complex: remarkable effect of a perfluoroalkyl phosphine
Angew. Chem. Int. Ed.
Comparing quantitative prediction methods for the discovery of small-molecule chiral catalysts
Nat. Rev. Chem.
Machine learning the ropes: principles, applications and directions in synthetic chemistry
Chem. Soc. Rev.
Interrogating selectivity in catalysis using molecular vibrations
Nature
Parametrization of non-covalent interactions for transition state interrogation applied to asymmetric catalysis
J. Am. Chem. Soc.
Iterative supervised principal component analysis driven ligand design for regioselective Ti-catalyzed pyrrole synthesis
ACS Catal.
Predicting reaction performance in C–N cross-coupling using machine learning
Science
Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning
Science
Rapid virtual screening of enantioselective catalysts using CatVS
Nat. Cat.
Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies
Chem. Sci.
Hydrogénations et déshydrogénations par catalyse
Ber. Dtsch. Chem. Ges.
A generalized picture of C–C cross-coupling
ACS Catal.
Activity-based screening of homogeneous catalysts through the rapid assessment of theoretically derived turnover frequencies
ACS Catal.
The effect of structure upon the reactions of organic compounds. Benzene derivatives
J. Am. Chem. Soc.
Handbook of Molecular Descriptors
Predicting electronic structure properties of transition metal complexes with neural networks
Chem. Sci.
Convolutional networks on graphs for learning molecular fingerprints
Adv. Neural Inf. Proces. Syst.
Neural message passing for quantum chemistry
Deep learning for deep chemistry: optimizing the prediction of chemical patterns
Front. Chem.
SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules
J. Chem. Inf. Comput. Sci.
Molecular Transformer: a model for uncertainty-calibrated chemical reaction prediction
ACS Cent. Sci.
Convolutional neural network based on SMILES representation of compounds for detecting chemical motif
BMC Bioinformatics
Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation
Mach. Learn. Sci. Technol.
Cited by (0)
- 5
These authors contributed equally to this work