Retrosynthesis prediction using grammar-based neural machine translation: An information-theoretic approach,Computers & Chemical Engineering

当前位置： X-MOL 学术 › Comput. Chem. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Retrosynthesis prediction using grammar-based neural machine translation: An information-theoretic approach
Computers & Chemical Engineering ( IF 4.3 ) Pub Date : 2021-09-11 , DOI: 10.1016/j.compchemeng.2021.107533
Vipul Mann ₁ , Venkat Venkatasubramanian ₁

Affiliation

Retrosynthetic prediction is one of the main challenges in chemical synthesis because it requires a search over the space of plausible chemical reactions that often results in complex, multi-step, branched synthesis trees for even moderately complex organic reactions. Here, we propose an approach that performs single-step retrosynthesis prediction using SMILES grammar-based representations in a neural machine translation framework. Information-theoretic analyses of such grammar-representations reveal that they are superior to SMILES representations and are better-suited for machine learning tasks due to their underlying redundancy and high information capacity. We report the top-1 prediction accuracy of $43.8 %$ (syntactic validity $95.6 %$ ) and maximal fragment (MaxFrag) accuracy of $50.4 %$ . Comparing our model’s performance with previous work that used character-based SMILES representations demonstrate significant reduction in grammatically invalid predictions and improved prediction accuracy. Fewer invalid predictions for both known and unknown reaction class scenarios demonstrate the model’s ability to learn the underlying SMILES grammar efficiently.

中文翻译：

使用基于语法的神经机器翻译进行逆合成预测：一种信息论方法

逆合成预测是化学合成中的主要挑战之一，因为它需要搜索似是而非的化学反应空间，这通常会导致复杂的、多步骤的、分支的合成树，即使是中等复杂的有机反应。在这里，我们提出了一种在神经机器翻译框架中使用基于 SMILES 语法的表示执行单步逆合成预测的方法。对此类语法表示的信息理论分析表明，它们优于 SMILES 表示，并且由于其潜在的冗余和高信息容量，因此更适合机器学习任务。我们报告了 top-1 的预测精度 $43.8 %$ （句法有效性 $95.6 %$ ) 和最大片段 (MaxFrag) 准确度 $50.4 %$ . 将我们模型的性能与之前使用基于字符的 SMILES 表示的工作进行比较，表明语法无效预测显着减少，预测准确性提高。已知和未知反应类场景的无效预测较少，这表明该模型能够有效地学习基础 SMILES 语法。

更新日期：2021-09-20

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>