“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models†,Chemical Science

当前位置： X-MOL 学术 › Chem. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models†
Chemical Science ( IF 8.4 ) Pub Date : 2018-06-22 00:00:00 , DOI: 10.1039/c8sc02339e
Philippe Schwaller _{1,

2,

3} , Théophile Gaudin _{1,

2,

3} , Dávid Lányi _{1,

2,

3} , Costas Bekas _{1,

2,

3} , Teodoro Laino _{1,

2,

3}

Affiliation

There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a tokenization, which is arbitrarily extensible with reaction information. Using an attention-based model borrowed from human language translation, we improve the state-of-the-art solutions in reaction prediction on the top-1 accuracy by achieving 80.3% without relying on auxiliary knowledge, such as reaction templates or explicit atomic features. Also, a top-1 accuracy of 65.4% is reached on a larger and noisier dataset.

中文翻译：

“发现于翻译中”：使用神经序列到序列模型预测复杂的有机化学反应的结果†

有一个直观的类比，就是有机化学家对化合物的理解与语言使用者对单词的理解。基于此类推，可以引入基本概念并分析语言分析对有机化学领域的潜在影响。在这项工作中，我们通过引入无模板序列到序列模型，训练有素的端到端和完全数据驱动的方法，将反应预测任务转换为翻译问题。我们提出了令牌化，它可以随反应信息任意扩展。使用从人类语言翻译中获得的基于注意力的模型，我们在不依赖辅助知识（例如反应模板或显式原子特征）的情况下，达到了80.3％的水平，从而提高了对top-1准确度的反应预测的最新解决方案。还，

更新日期：2018-06-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>