当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SMILES-based deep generative scaffold decorator for de-novo drug design
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2020-05-29 , DOI: 10.1186/s13321-020-00441-8
Josep Arús-Pous , Atanas Patronov , Esben Jannik Bjerrum , Christian Tyrchan , Jean-Louis Reymond , Hongming Chen , Ola Engkvist

Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.

中文翻译:

基于SMILES的深度生成支架装饰物,用于新型药物设计

用代表SMILES字符串的小分子集训练的分子生成模型可以生成化学空间的较大区域。不幸的是,由于SMILES弦的顺序性质,这些模型无法在给定支架的情况下生成分子(即具有明确连接点的部分构建的分子)。在本文中,我们报告了一种新的基于SMILES的分子生成架构,该架构可从支架生成分子,并且可以从任何任意分子集中进行训练。由于新的分子集预处理算法可以彻底切开每个分子的无环键的所有可能组合,从而组合获得大量具有各自装饰的支架,因此该方法得以实现。此外,它用作数据增强技术,可以很容易地与随机SMILES结合使用,以小集获得更好的结果。描述了两个实例,展示了该结构在药物和合成化学中的潜力:首先,使用从一小套多巴胺受体D2(DRD2)活性调节剂中获得的训练集对模型进行训练,并能够有意义地修饰各种支架并获得预测对DRD2有活性的分子系列。其次,使用合成化学限制条件(RECAP规则)有选择地将来自ChEMBL的大量类似药物的分子切成薄片。在这种情况下,仅对得到的带有装饰物的支架进行过滤,以允许那些包含碎片状装饰物的支架。该过滤过程允许使用该数据集训练的模型使用已知的合成方法选择性地用通常被预测为可合成并附着于支架的片段装饰各种支架。在这两种情况下,这些模型都已经能够使用特定知识来修饰分子,而无需通过其他技术(例如强化学习)来添加分子。我们预想,该体系结构将成为现有的从头分子生成体系结构的有用补充。
更新日期:2020-05-29
down
wechat
bug