当前位置: X-MOL 学术React. Chem. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generating molecules with optimized aqueous solubility using iterative graph translation
Reaction Chemistry & Engineering ( IF 3.9 ) Pub Date : 2021-11-15 , DOI: 10.1039/d1re00315a
Camille Bilodeau 1 , Wengong Jin 2 , Hongyun Xu 3 , Jillian A. Emerson 3 , Sukrit Mukhopadhyay 3 , Thomas H. Kalantar 3 , Tommi Jaakkola 2 , Regina Barzilay 2 , Klavs F. Jensen 1
Affiliation  

While molecular discovery is critical for solving many scientific problems, the time and resource costs of experiments make it intractable to fully explore chemical space. Here, we present a generative modeling framework that proposes novel molecules that are 1) based on starting candidate structures and 2) optimized with respect to one or more objectives or constraints. We explore how this framework performs in an applied setting by focusing on the problem of optimizing molecules for aqueous solubility, using an experimental database containing data curated from the literature. The resulting model was capable of improving molecules with a range of starting solubilities. When synthetic feasibility was applied as a secondary optimization constraint (estimated using a combination of synthetic accessibility and retrosynthetic accessibility scores), the model generated synthetically feasible molecules 83.0% of the time (compared with 59.9% of the time without the constraint). To validate model performance experimentally, a set of candidate molecules was translated using the model and the solubilities of the candidate and generated molecules were verified experimentally. We additionally validated model performance via experimental measurements by holding out the top 100 most soluble molecules during training and showing that the model could rediscover 33 of those molecules. To determine the sensitivity of model performance to dataset size, we trained the model on different subsets of the initial training dataset. We found that model performance did not decrease significantly when the model was trained on a random 50% subset of the training data but did decrease when the model was trained on subsets containing only less soluble molecules (i.e., the bottom 50%). Overall, this framework serves as a tool for generating optimized, synthetically feasible molecules that can be applied to a range of problems in chemistry and chemical engineering.

中文翻译:

使用迭代图转换生成具有优化水溶性的分子

虽然分子发现对于解决许多科学问题至关重要,但实验的时间和资源成本使得全面探索化学空间变得棘手。在这里,我们提出了一种生成建模框架,该框架提出了新分子,这些分子 1) 基于起始候选结构和 2) 针对一个或多个目标或约束进行优化。我们使用包含文献整理数据的实验数据库,通过关注优化分子的水溶性问题,探索该框架在应用环境中的表现。所得模型能够改进具有一系列起始溶解度的分子。当合成可行性被用作二级优化约束时(使用合成可访问性和逆合成可访问性分数的组合进行估计),该模型在 83.0% 的时间内生成了合成可行的分子(相比之下,没有约束的时间为 59.9%)。为了通过实验验证模型性能,使用该模型翻译了一组候选分子,并通过实验验证了候选分子和生成分子的溶解度。我们还验证了模型性能通过实验测量,在训练期间保留前 100 个最易溶解的分子,并表明该模型可以重新发现其中的 33 个分子。为了确定模型性能对数据集大小的敏感性,我们在初始训练数据集的不同子集上训练模型。我们发现,当模型在训练数据的随机 50% 子集上训练时,模型性能没有显着下降,但当模型在仅包含较少可溶性分子的子集上训练时(底部 50%),模型性能确实下降。总体而言,该框架可用作生成优化的、合成可行的分子的工具,这些分子可应用于化学和化学工程中的一系列问题。
更新日期:2021-11-24
down
wechat
bug