GEN: highly efficient SMILES explorer using autodidactic generative examination networks,Journal of Cheminformatics

当前位置： X-MOL 学术 › J. Cheminfom. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

GEN: highly efficient SMILES explorer using autodidactic generative examination networks
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2020-04-10 , DOI: 10.1186/s13321-020-00425-8
Ruud van Deursen , Peter Ertl , Igor V. Tetko , Guillaume Godin

Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation. In our GENs, we have used an architecture based on multiple concatenated bidirectional RNN units to enhance the validity of generated SMILES. GENs autonomously learn the target space in a few epochs and are stopped early using an independent online examination mechanism, measuring the quality of the generated set. Herein we have used online statistical quality control (SQC) on the percentage of valid molecular SMILES as examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95–98%) can be generated using multiple parallel encoding layers in combination with SMILES augmentation using unrestricted SMILES randomization. Our trained models combine an excellent novelty rate (85–90%) while generating SMILES with strong conservation of the property space (95–99%). In GENs, both the generative network and the examination mechanism are open to other architectures and quality criteria.

中文翻译：

GEN：使用自动教学生成检查网络的高效SMILES Explorer

递归神经网络已被广泛用于在定义的化学空间中生成数百万个从头分子。报告的深度生成模型完全基于LSTM和/或GRU单位，并且经常使用规范的SMILES进行训练。在这项研究中，我们介绍了生成考试网络（GEN）作为训练SMILES生成深度生成网络的一种新方法。在我们的GEN中，我们使用了基于多个串联的双向RNN单元的体系结构，以增强生成的SMILES的有效性。GEN在几个时期内自主学习目标空间，并使用独立的在线检查机制尽早停止，以测量生成集的质量。本文中，我们使用有效分子SMILES百分比的在线统计质量控制（SQC）作为检查方法，以选择最早可用的稳定模型权重。使用多个并行编码层结合使用无限制SMILES随机化的SMILES增强，可以生成很高水平的有效SMILES（95–98％）。我们训练有素的模型结合了极高的新颖性（85–90％），同时生成了SMILES，并具有强烈的财产空间保护（95–99％）。在GEN中，生成网络和检查机制都对其他体系结构和质量标准开放。我们训练有素的模型结合了极高的新颖性（85–90％），同时生成了SMILES，并具有强烈的财产空间保护（95–99％）。在GEN中，生成网络和检查机制都对其他体系结构和质量标准开放。我们训练有素的模型结合了极高的新颖性（85–90％），同时生成了SMILES，并具有强烈的财产空间保护（95–99％）。在GEN中，生成网络和检查机制都对其他体系结构和质量标准开放。

更新日期：2020-04-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>