Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning,Science

当前位置： X-MOL 学术 › Science › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning
Science ( IF 56.9 ) Pub Date : 2019-01-17 , DOI: 10.1126/science.aau5631
Andrew F Zahrt ₁ , Jeremy J Henle ₁ , Brennan T Rose ₁ , Yang Wang ₁ , William T Darrow ₁ , Scott E Denmark ₁

Affiliation

Predicting catalyst selectivity Asymmetric catalysis is widely used in chemical research and manufacturing to access just one of two possible mirror-image products. Nonetheless, the process of tuning catalyst structure to optimize selectivity is still largely empirical. Zahrt et al. present a framework for more efficient, predictive optimization. As a proof of principle, they focused on a known coupling reaction of imines and thiols catalyzed by chiral phosphoric acid compounds. By modeling multiple conformations of more than 800 prospective catalysts, and then training machine-learning algorithms on a subset of experimental results, they achieved highly accurate predictions of enantioselectivities. Science, this issue p. eaau5631 A model encompassing multiple conformations of chiral phosphoric acid catalysts accurately predicts enantioselectivities. INTRODUCTION The development of new synthetic methods in organic chemistry is traditionally accomplished through empirical optimization. Catalyst design, wherein experimentalists attempt to qualitatively identify correlations between catalyst structure and catalyst efficiency, is no exception. However, this approach is plagued by numerous deficiencies, including the lack of mechanistic understanding of a new transformation, the inherent limitations of human cognitive abilities to find patterns in large collections of data, and the lack of quantitative guidelines to aid catalyst identification. Chemoinformatics provides an attractive alternative to empiricism for several reasons: Mechanistic information is not a prerequisite, catalyst structures can be characterized by three-dimensional (3D) descriptors (numerical representations of molecular properties derived from the 3D molecular structure) that quantify the steric and electronic properties of thousands of candidate molecules, and the suitability of a given catalyst candidate can be quantified by comparing its properties with a computationally derived model trained on experimental data. The ability to accurately predict a selective catalyst by using a set of less than optimal data remains a major goal for machine learning with respect to asymmetric catalysis. We report a method to achieve this goal and propose a more efficient alternative to traditional catalyst design. RATIONALE The workflow we have created consists of the following components: (i) construction of an in silico library comprising a large collection of conceivable, synthetically accessible catalysts derived from a particular scaffold; (ii) calculation of relevant chemical descriptors for each scaffold; (iii) selection of a representative subset of the catalysts [this subset is termed the universal training set (UTS) because it is agnostic to reaction or mechanism and thus can be used to optimize any reaction catalyzed by that scaffold]; (iv) collection of the training data; and (v) application of machine learning methods to generate models that predict the enantioselectivity of each member of the in silico library. These models are evaluated with an external test set of catalysts (predicting selectivities of catalysts outside of the training data). The validated models can then be used to select the optimal catalyst for a given reaction. RESULTS To demonstrate the viability of our method, we predicted reaction outcomes with substrate combinations and catalysts different from the training data and simulated a situation in which highly selective reactions had not been achieved. In the first demonstration, a model was constructed by using support vector machines and validated with three different external test sets. The first test set evaluated the ability of the model to predict the selectivity of only reactions forming new products with catalysts from the training set. The model performed well, with a mean absolute deviation (MAD) of 0.161 kcal/mol. Next, the same model was used to predict the selectivity of an external test set of catalysts with substrate combinations from the training set. The performance of the model was still highly accurate, with a MAD of 0.211 kcal/mol. Lastly, reactions forming new products with the external test catalysts were predicted with a MAD of 0.236 kcal/mol. In the second study, no reactions with selectivity above 80% enantiomeric excess were used as training data. Deep feed-forward neural networks accurately reproduced the experimental selectivity data, successfully predicting the most selective reactions. More notably, the general trends in selectivity, on the basis of average catalyst selectivity, were correctly identified. Despite omitting about half of the experimental free energy range from the training data, we could still make accurate predictions in this region of selectivity space. CONCLUSION The capability to predict selective catalysts has the potential to change the way chemists select and optimize chiral catalysts from an empirically guided to a mathematically guided approach. Chemoinformatics-guided optimization protocol. (A) Generation of a large in silico library of catalyst candidates. (B) Calculation of robust chemical descriptors. (C) Selection of a UTS. (D) Acquisition of experimental selectivity data. (E) Application of machine learning to use moderate- to low-selectivity reactions to predict high-selectivity reactions. R, any group; X, O or S; Y, OH, SH, or NHTf; PC, principal component; ΔΔG, mean selectivity. Catalyst design in asymmetric reaction development has traditionally been driven by empiricism, wherein experimentalists attempt to qualitatively recognize structural patterns to improve selectivity. Machine learning algorithms and chemoinformatics can potentially accelerate this process by recognizing otherwise inscrutable patterns in large datasets. Herein we report a computationally guided workflow for chiral catalyst selection using chemoinformatics at every stage of development. Robust molecular descriptors that are agnostic to the catalyst scaffold allow for selection of a universal training set on the basis of steric and electronic properties. This set can be used to train machine learning methods to make highly accurate predictive models over a broad range of selectivity space. Using support vector machines and deep feed-forward neural networks, we demonstrate accurate predictive modeling in the chiral phosphoric acid–catalyzed thiol addition to N-acylimines.

中文翻译：

通过计算机驱动的工作流程和机器学习预测更高选择性的催化剂

预测催化剂选择性不对称催化广泛用于化学研究和制造，以获取两种可能的镜像产品中的一种。尽管如此，调整催化剂结构以优化选择性的过程仍然主要是经验性的。扎尔特等人。提出一个更有效的预测优化框架。作为原理证明，他们专注于已知的由手性磷酸化合物催化的亚胺和硫醇的偶联反应。通过对 800 多种预期催化剂的多种构象进行建模，然后根据实验结果的子集训练机器学习算法，他们实现了对映选择性的高度准确预测。科学，这个问题 p。eaau5631 包含多种手性磷酸催化剂构象的模型可准确预测对映选择性。引言有机化学中新合成方法的开发传统上是通过经验优化来完成的。催化剂设计，其中实验者试图定性地确定催化剂结构和催化剂效率之间的相关性，也不例外。然而，这种方法受到许多缺陷的困扰，包括缺乏对新转化的机械理解、人类在大量数据中寻找模式的认知能力的固有局限性，以及缺乏帮助催化剂识别的定量指南。化学信息学为经验主义提供了一种有吸引力的替代方法，原因如下：机械信息不是先决条件，催化剂结构可以通过三维 (3D) 描述符（源自 3D 分子结构的分子特性的数字表示）来表征，该描述符量化数千个候选分子的空间和电子特性，以及给定的候选催化剂可以通过将其性质与基于实验数据训练的计算得出的模型进行比较来量化。通过使用一组不太理想的数据来准确预测选择性催化剂的能力仍然是机器学习关于不对称催化的主要目标。我们报告了一种实现这一目标的方法，并提出了一种比传统催化剂设计更有效的替代方案。基本原理我们创建的工作流由以下组件组成：(i) 构建一个包含大量来自特定支架的可想象的、可合成的催化剂的计算机库；(ii) 计算每个支架的相关化学描述符；(iii) 选择具有代表性的催化剂子集[该子集被称为通用训练集 (UTS)，因为它与反应或机制无关，因此可用于优化由该支架催化的任何反应]；(iv) 训练数据的收集；(v) 应用机器学习方法来生成预测 in silico 文库每个成员的对映选择性的模型。这些模型使用外部催化剂测试集进行评估（预测训练数据之外的催化剂选择性）。然后可以使用经过验证的模型为给定的反应选择最佳催化剂。结果为了证明我们方法的可行性，我们预测了与训练数据不同的底物组合和催化剂的反应结果，并模拟了没有实现高选择性反应的情况。在第一个演示中，使用支持向量机构建了一个模型，并使用三个不同的外部测试集进行了验证。第一个测试集评估了模型仅预测与来自训练集的催化剂形成新产品的反应的选择性的能力。该模型表现良好，平均绝对偏差 (MAD) 为 0.161 kcal/mol。接下来，使用相同的模型来预测外部测试集催化剂与来自训练集的底物组合的选择性。该模型的性能仍然非常准确，MAD 为 0.211 kcal/mol。最后，预测与外部测试催化剂形成新产品的反应的 MAD 为 0.236 kcal/mol。在第二项研究中，没有使用选择性超过 80% 对映体过量的反应作为训练数据。深度前馈神经网络准确再现了实验选择性数据，成功预测了最具选择性的反应。更值得注意的是，在平均催化剂选择性的基础上，选择性的一般趋势被正确识别。尽管从训练数据中省略了大约一半的实验自由能范围，我们仍然可以在这个选择性空间区域中做出准确的预测。结论预测选择性催化剂的能力有可能改变化学家选择和优化手性催化剂的方式，从经验指导转向数学指导方法。化学信息学指导的优化协议。(A) 生成大型催化剂候选物的计算机库。(B) 计算稳健的化学描述符。(C) 单位信托计划的选择。(D) 实验选择性数据的获取。(E) 应用机器学习使用中低选择性反应来预测高选择性反应。R，任意组；X、O 或 S；Y、OH、SH 或 NHTf；PC，主成分；ΔΔG，平均选择性。不对称反应开发中的催化剂设计传统上是由经验主义驱动的，其中实验者试图定性地识别结构模式以提高选择性。机器学习算法和化学信息学可以通过识别大型数据集中原本难以理解的模式来加速这一过程。在此，我们报告了在每个开发阶段使用化学信息学进行手性催化剂选择的计算指导工作流程。与催化剂支架无关的强大分子描述符允许根据空间和电子特性选择通用训练集。该集合可用于训练机器学习方法，以在广泛的选择性空间上制作高度准确的预测模型。使用支持向量机和深度前馈神经网络，我们展示了手性磷酸催化硫醇添加到 N-酰基亚胺的准确预测模型。

更新日期：2019-01-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>