当前位置: X-MOL 学术Mach. Learn. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2
Machine Learning: Science and Technology ( IF 6.013 ) Pub Date : 2021-04-09 , DOI: 10.1088/2632-2153/abe808
Jannis Born 1, 2 , Matteo Manica 1 , Joris Cadow 1 , Greta Markert 1 , Nil Adell Mill 1 , Modestas Filipavicius 1 , Nikita Janakarajan 1 , Antonio Cardinale 1, 3 , Teodoro Laino 1 , Mara Rodrguez Martnez 1
Affiliation  

Bridging systems biology and drug design, we propose a deep learning framework for de novo discovery of molecules tailored to bind with given protein targets. Our methodology is exemplified by the task of designing antiviral candidates to target SARS-CoV-2 related proteins. Crucially, our framework does not require fine-tuning for specific proteins but is demonstrated to generalize in proposing ligands with high predicted binding affinities against unseen targets. Coupling our framework with the automatic retrosynthesis prediction of IBM RXN for Chemistry, we demonstrate the feasibility of swift chemical synthesis of molecules with potential antiviral properties that were designed against a specific protein target. In particular, we synthesize an antiviral candidate designed against the host protein angiotensin converting enzyme 2 (ACE2); a surface receptor on human respiratory epithelial cells that facilitates SARS-CoV-2 cell entry through its spike glycoprotein.

This is achieved as follows. First, we train a multimodal ligand–protein binding affinity model on predicting affinities of bioactive compounds to target proteins and couple this model with pharmacological toxicity predictors. Exploiting this multi-objective as a reward function of a conditional molecular generator that consists of two variational autoencoders (VAE), our framework steers the generation toward regions of the chemical space with high-reward molecules. Specifically, we explore a challenging setting of generating ligands against unseen protein targets by performing a leave-one-out-cross-validation on 41 SARS-CoV-2-related target proteins. Using deep reinforcement learning, it is demonstrated that in 35 out of 41 cases, the generation is biased towards sampling binding ligands, with an average increase of 83% comparing to an unbiased VAE. The generated molecules exhibit favorable properties in terms of target binding affinity, selectivity and drug-likeness. We use molecular retrosynthetic models to provide a synthetic accessibility assessment of the best generated hit molecules. Finally, with this end-to-end framework, we synthesize 3-Bromobenzylamine, a potential inhibitor of the host ACE2 protein, solely based on the recommendations of a molecular retrosynthesis model and a synthesis protocol prediction model. We hope that our framework can contribute towards swift discovery of de novo molecules with desired pharmacological properties.



中文翻译:

用于发现和合成新型配体的数据驱动分子设计:以 SARS-CoV-2 为例

桥接系统生物学和药物设计,我们提出了一个深度学习框架,用于从头发现分子,用于与给定的蛋白质靶标结合。我们的方法以设计抗病毒候选药物以靶向 SARS-CoV-2 相关蛋白的任务为例。至关重要的是,我们的框架不需要对特定蛋白质进行微调,但已证明可以推广对不可见目标具有高预测结合亲和力的配体。将我们的框架与IBM RXN for Chemistry的自动逆合成预测相结合,我们证明了针对特定蛋白质靶标设计的具有潜在抗病毒特性的分子的快速化学合成的可行性。特别是,我们合成了一种针对宿主蛋白血管紧张素转化酶 2 (ACE2) 设计的抗病毒候选药物;人类呼吸道上皮细胞上的一种表面受体,可通过其刺突糖蛋白促进 SARS-CoV-2 细胞进入。

这是如下实现的。首先,我们训练了一个多模式配体-蛋白质结合亲和力模型,用于预测生物活性化合物对靶蛋白的亲和力,并将该模型与药理学毒性预测因子相结合。利用这个多目标作为由两个变分自编码器 (VAE) 组成的条件分子生成器的奖励函数,我们的框架将生成引导到具有高奖励分子的化学空间区域。具体来说,我们探索了一个具有挑战性的环境,即针对看不见的配体生成配体。通过对 41 种 SARS-CoV-2 相关靶蛋白进行留一法交叉验证来确定蛋白质靶标。使用深度强化学习,证明在 41 个案例中的 35 个案例中,生成偏向于采样结合配体,与无偏 VAE 相比平均增加了 83%。生成的分子在目标结合亲和力、选择性和药物相似性方面表现出有利的特性。我们使用分子逆合成模型对生成的最佳命中分子进行合成可访问性评估。最后,通过这个端到端框架,我们完全基于分子逆合成模型和合成方案预测模型的建议,合成了 3-溴苄胺,一种宿主 ACE2 蛋白的潜在抑制剂。具有所需药理特性的从头分子。

更新日期:2021-04-09
down
wechat
bug