Artificial applicability labels for improving policies in retrosynthesis prediction,Machine Learning: Science and Technology

当前位置： X-MOL 学术 › Mach. Learn. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Artificial applicability labels for improving policies in retrosynthesis prediction
Machine Learning: Science and Technology ( IF 6.3 ) Pub Date : 2020-12-31 , DOI: 10.1088/2632-2153/abcf90
Esben Jannik Bjerrum ₁ , Amol Thakkar _{1,

2} , Ola Engkvist ₁

Affiliation

Automated retrosynthetic planning algorithms are a research area of increasing importance. Automated reaction-template extraction from large datasets, in conjunction with neural-network-enhanced tree-search algorithms, can find plausible routes to target compounds in seconds. However, the current method for training neural networks to predict suitable templates for a given target product leads to many predictions that are not applicable in silico. Most templates in the top 50 suggested templates cannot be applied to the target molecule to perform the virtual reaction. Here, we describe how to generate data and train a neural network policy that predicts whether templates are applicable or not. First, we generate a massive training dataset by applying each retrosynthetic template to each product from our reaction database. Second, we train a neural network to perform near-perfect prediction of the applicability labels on a held-out test set. The trained network is then joined with a policy model trained to predict and prioritize templates using the labels from the original dataset. The combined model was found to outperform the policy model in a route-finding task using 1700 compounds from our internal drug-discovery projects.

中文翻译：

人工适用标签，用于改进回合预测中的策略

自动化的综合计划算法是一个日益重要的研究领域。从大型数据集中自动提取反应模板，再结合神经网络增强的树搜索算法，可以在几秒钟内找到通往目标化合物的可行途径。但是，当前用于训练神经网络以预测给定目标产品的合适模板的方法会导致许多无法在计算机上应用的预测。建议使用的前50个模板中的大多数模板都无法应用于目标分子以执行虚拟反应。在这里，我们描述了如何生成数据并训练可预测模板是否适用的神经网络策略。首先，我们通过将每个反合成模板应用于反应数据库中的每个产品，从而生成了大量的训练数据集。第二，我们训练一个神经网络，以在一个坚持的测试集上对适用性标签进行近乎完美的预测。然后，将训练有素的网络与策略模型结合在一起，该策略模型经过训练可以使用原始数据集中的标签预测模板并确定其优先级。在使用我们内部药物发现项目中的1700种化合物进行的路线查找任务中，发现组合模型优于策略模型。

更新日期：2020-12-31

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文