QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction,Journal of Cheminformatics

当前位置： X-MOL 学术 › J. Cheminfom. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2020-06-05 , DOI: 10.1186/s13321-020-00444-5
Isidro Cortés-Ciriano _{1,

2} , Ctibor Škuta ₃ , Andreas Bender ₁ , Daniel Svozil _{3,

4}

Affiliation

Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using Ki, Kd, IC50 and EC50 data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC50 data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65–0.95 pIC50 units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76–1.00 pIC50 units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02–0.08 pIC50 units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .

中文翻译：

QSAR 衍生的亲和力指纹（第 2 部分）：效力预测的建模性能

亲和指纹报告了一组测定中小分子的活性，从而允许收集有关结构不同的化合物的生物活性的信息，其中仅基于化学结构的模型通常是有限的，并且可以模拟复杂的生物终点，例如人类毒性和体外癌细胞系敏感性。在这里，我们建议使用计算预测的生物活性概况作为化合物描述符来模拟体外化合物活性。为此，我们应用并验证了一个计算 QSAR 衍生亲和指纹 (QAFFP) 的框架，该框架使用一组 1360 个 QSAR 模型，这些模型是使用 ChEMBL 数据库中的 Ki、Kd、IC50 和 EC50 数据生成的。因此，QAFFP 代表了一种根据化合物在生物活性空间中的相似性来编码和关联化合物的方法。为了衡量 QAFFP 的预测能力，我们从 ChEMBL 数据库中收集了临床前药物发现中广泛使用的 18 种不同癌细胞系的 IC50 数据，以及 25 种不同的蛋白质靶点数据集。这项研究是对第 1 部分的补充，其中评估了 QAFFP 在相似性搜索、支架跳跃和生物活性分类方面的性能。尽管存在固有的噪声，我们表明使用 QAFFP 作为描述符会导致测试集的预测误差在 ~ 0.65–0.95 pIC50 单位范围内，这与 ChEMBL 中生物活性数据的估计不确定性（0.76–1.00 pIC50 单位）相当。我们发现 QAFFP 的预测能力比 Morgan2 指纹以及 1D 和 2D 理化描述符稍差，效应大小在 0.02-0.08 pIC50 单位范围内。在 QAFFP 的生成中包含预测能力较低的 QSAR 模型并不会提高预测能力。鉴于我们用于计算 QAFFP 的 QSAR 模型是仅根据数据可用性来选择的，我们预计使用更多样化和具有生物学意义的目标生成更好的 QAFFP 建模结果。数据集和 Python 代码可在 https://github.com/isidroc/QAFFP_regression 上公开获取。

更新日期：2020-06-05

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11