当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Nonadditivity in public and inhouse data: implications for drug design
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2021-07-02 , DOI: 10.1186/s13321-021-00525-z
D Gogishvili 1, 2 , E Nittinger 1 , C Margreitter 3 , C Tyrchan 1
Affiliation  

Numerous ligand-based drug discovery projects are based on structure-activity relationship (SAR) analysis, such as Free-Wilson (FW) or matched molecular pair (MMP) analysis. Intrinsically they assume linearity and additivity of substituent contributions. These techniques are challenged by nonadditivity (NA) in protein–ligand binding where the change of two functional groups in one molecule results in much higher or lower activity than expected from the respective single changes. Identifying nonlinear cases and possible underlying explanations is crucial for a drug design project since it might influence which lead to follow. By systematically analyzing all AstraZeneca (AZ) inhouse compound data and publicly available ChEMBL25 bioactivity data, we show significant NA events in almost every second assay among the inhouse and once in every third assay in public data sets. Furthermore, 9.4% of all compounds of the AZ database and 5.1% from public sources display significant additivity shifts indicating important SAR features or fundamental measurement errors. Using NA data in combination with machine learning showed that nonadditive data is challenging to predict and even the addition of nonadditive data into training did not result in an increase in predictivity. Overall, NA analysis should be applied on a regular basis in many areas of computational chemistry and can further improve rational drug design.

中文翻译:

公共和内部数据的不可加性:对药物设计的影响

许多基于配体的药物发现项目都基于构效关系 (SAR) 分析,例如 Free-Wilson (FW) 或匹配分子对 (MMP) 分析。从本质上讲,它们假定取代基贡献的线性和可加性。这些技术受到蛋白质 - 配体结合中的非可加性 (NA) 的挑战,其中一个分子中两个官能团的变化会导致比各自单一变化预期的活性高得多或低得多。识别非线性案例和可能的潜在解释对于药物设计项目至关重要,因为它可能会影响后续工作。通过系统地分析所有阿斯利康 (AZ) 内部化合物数据和公开可用的 ChEMBL25 生物活性数据,我们在内部几乎每第二次检测中都显示出显着的 NA 事件,在公共数据集中每三次检测中显示一次。此外,AZ 数据库中 9.4% 的所有化合物和来自公共来源的 5.1% 显示显着的可加性偏移,表明重要的 SAR 特征或基本测量错误。将 NA 数据与机器学习结合使用表明,非加性数据很难预测,甚至将非加性数据添加到训练中也不会导致预测性增加。总体而言,NA 分析应定期应用于计算化学的许多领域,并可以进一步改进合理的药物设计。来自公共来源的 1% 显示显着的可加性偏移,表明重要的 SAR 特征或基本测量误差。将 NA 数据与机器学习结合使用表明,非加性数据很难预测,甚至将非加性数据添加到训练中也不会导致预测性增加。总体而言,NA 分析应定期应用于计算化学的许多领域,并可以进一步改进合理的药物设计。来自公共来源的 1% 显示显着的可加性偏移,表明重要的 SAR 特征或基本测量误差。将 NA 数据与机器学习结合使用表明,非加性数据很难预测,甚至将非加性数据添加到训练中也不会导致预测性增加。总体而言,NA 分析应定期应用于计算化学的许多领域,并可以进一步改进合理的药物设计。
更新日期:2021-07-04
down
wechat
bug