当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2022-11-23 , DOI: 10.1021/acs.jcim.2c01290
Jialu Wu 1, 2 , Junmei Wang 3 , Zhenxing Wu 1, 2 , Shengyu Zhang 4 , Yafeng Deng 2 , Yu Kang 1 , Dongsheng Cao 5 , Chang-Yu Hsieh 1 , Tingjun Hou 1
Affiliation  

Lipophilicity (logD) and aqueous solubility (logSw) play a central role in drug development. The accurate prediction of these properties remains to be solved due to data scarcity. Current methodologies neglect the intrinsic relationships between physicochemical properties and usually ignore the ionization effects. Here, we propose an attention-driven mixture-of-experts (MoE) model named ALipSol, which explicitly reproduces the hierarchy of task relationships. We adopt the principle of divide-and-conquer by breaking down the complex end point (logD or logSw) into simpler ones (acidic pKa, basic pKa, and logP) and allocating a specific expert network for each subproblem. Subsequently, we implement transfer learning to extract knowledge from related tasks, thus alleviating the dilemma of limited data. Additionally, we substitute the gating network with an attention mechanism to better capture the dynamic task relationships on a per-example basis. We adopt local fine-tuning and consensus prediction to further boost model performance. Extensive evaluation experiments verify the success of the ALipSol model, which achieves RMSE improvement of 8.04%, 2.49%, 8.57%, 12.8%, and 8.60% on the Lipop, ESOL, AqSolDB, external logD, and external logS data sets, respectively, compared with Attentive FP and the state-of-the-art in silico tools. In particular, our model yields more significant advantages (Welch’s t-test) for small training data, implying its high robustness and generalizability. The interpretability analysis proves that the atom contributions learned by ALipSol are more reasonable compared with the vanilla Attentive FP, and the substitution effects in benzene derivatives agreed well with empirical constants, revealing the potential of our model to extract useful patterns from data and provide guidance for lead optimization.

中文翻译:

ALipSol:用于亲脂性和溶解度预测的注意力驱动的专家混合模型

亲脂性 (log D ) 和水溶性 (log S w ) 在药物开发中起着核心作用。由于数据稀缺,这些特性的准确预测仍有待解决。目前的方法忽略了物理化学性质之间的内在关系,通常忽略了电离效应。在这里,我们提出了一个名为 ALipSol 的注意力驱动的混合专家 (MoE) 模型,它明确地再现了任务关系的层次结构。我们采用分而治之的原则,将复杂的终点(log D或 log S w)分解为更简单的终点(酸性 p K a、碱性 p K a和 log P) 并为每个子问题分配一个特定的专家网络。随后,我们实施迁移学习以从相关任务中提取知识,从而缓解有限数据的困境。此外,我们用注意力机制代替门控网络,以更好地捕获每个示例的动态任务关系。我们采用局部微调和共识预测来进一步提升模型性能。广泛的评估实验验证了 ALipSol 模型的成功,它在 Lipop、ESOL、AqSolDB、external log D和 external log S数据集上实现了 8.04%、2.49%、8.57%、12.8% 和 8.60% 的 RMSE 改进,分别与 Attentive FP 和 state-of-the-art in silico相比工具。特别是,我们的模型对小训练数据产生了更显着的优势(Welch 的t检验),这意味着它具有很高的鲁棒性和普遍性。可解释性分析证明,与 vanilla Attentive FP 相比,ALipSol 学习到的原子贡献更合理,苯衍生物中的替代效应与经验常数吻合得很好,揭示了我们的模型从数据中提取有用模式并为后续研究提供指导的潜力。引导优化。
更新日期:2022-11-23
down
wechat
bug