当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicted Biological Activity of Purchasable Chemical Space
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2017-12-29 00:00:00 , DOI: 10.1021/acs.jcim.7b00316
John J. Irwin 1 , Garrett Gaskins 1, 2, 3, 4 , Teague Sterling 1 , Michael M. Mysinger 1 , Michael J. Keiser 1, 2, 3, 4
Affiliation  

Whereas 400 million distinct compounds are now purchasable within the span of a few weeks, the biological activities of most are unknown. To facilitate access to new chemistry for biology, we have combined the Similarity Ensemble Approach (SEA) with the maximum Tanimoto similarity to the nearest bioactive to predict activity for every commercially available molecule in ZINC. This method, which we label SEA+TC, outperforms both SEA and a naïve-Bayesian classifier via predictive performance on a 5-fold cross-validation of ChEMBL’s bioactivity data set (version 21). Using this method, predictions for over 40% of compounds (>160 million) have either high significance (pSEA ≥ 40), high similarity (ECFP4MaxTc ≥ 0.4), or both, for one or more of 1382 targets well described by ligands in the literature. Using a further 1347 less-well-described targets, we predict activities for an additional 11 million compounds. To gauge whether these predictions are sensible, we investigate 75 predictions for 50 drugs lacking a binding affinity annotation in ChEMBL. The 535 million predictions for over 171 million compounds at 2629 targets are linked to purchasing information and evidence to support each prediction and are freely available via https://zinc15.docking.org and https://files.docking.org.

中文翻译:

可购买化学空间的预测生物活性

尽管现在可以在几周内购买4亿种不同的化合物,但大多数的生物活性尚不清楚。为了促进获取新的生物学化学方法,我们将相似性集成方法(SEA)与最大的Tanimoto相似性与最近的生物活性剂相结合,以预测ZINC中每个可商购分子的活性。这种方法,我们标记为SEA + TC,通过对ChEMBL的生物活性数据集(21版)进行5倍交叉验证的预测性能,胜过SEA和朴素的贝叶斯分类器。使用这种方法,对于1382个目标中一个或多个目标的预测,对于超过40%的化合物(> 1.6亿个),它们具有很高的显着性(pSEA≥40),高度相似性(ECFP4MaxTc≥0.4)或两者都有。文学。使用另外1347个描述欠佳的目标,我们预计还会有1100万种化合物的活动。为了评估这些预测是否合理,我们针对ChEMBL中缺乏结合亲和力注释的50种药物调查了75种预测。针对2629个目标的超过1.71亿个化合物的5.35亿个预测与购买信息和证据相关联以支持每个预测,并可通过https://zinc15.docking.org和https://files.docking.org免费获得。
更新日期:2017-12-29
down
wechat
bug