当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Differential Multimolecule Fingerprint for Similarity Search─Making Use of Active and Inactive Compound Sets in Virtual Screening
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2022-05-25 , DOI: 10.1021/acs.jcim.2c00242
Michael C Hutter 1
Affiliation  

In conventional fingerprint methods, the similarity between two molecules is calculated using the Tanimoto index as a numerical criterion. Thus, the query molecules in virtual screening should be most representative of the wanted compound class at hand. In the concept introduced here, all available active molecules form a multimolecule fingerprint in which the appearing features are weighted according to their respective frequency. The features of inactive molecules are treated likewise and the resulting values are subtracted from those of the active ones. The obtained differential multimolecule fingerprint (DMMFP) is thus specific for the respective class of compounds. To account for the noninteger representation within this fingerprint, a modified Sørensen–Dice coefficient is used to compute the similarity. Potentially active molecules yield positive scores, whereas presumably inactive ones are denoted by negative values. The concept was applied to Angiotensin-converting enzyme (ACE) inhibitors, β2-adrenoceptor ligands, leukotriene A4 hydrolase inhibitors, dopamine D3 antagonists, and cytochrome CYP2C9 substrates, for which experimental binding affinities are known and was tested against decoys from DUD-E and a further background database consisting of molecules from the dark chemical matter, which comprises compounds that appear as frequent hitters across multiple assays. Using the 166 publicly available keys of the MACCS fingerprint and the larger PubChem fingerprint, actives were recovered with very high sensitivity. Furthermore, three marketed ACE inhibitors as well as the carbonic anhydrase II inhibitor dorzolamide were detected in the dark chemical matter data set. For comparison, the DMMFP was also used with a Bayesian classifier, for which the specificity (correctly classified inactives) and likewise the accuracy was superior. Conversely, the similarity score produced by the Sørensen–Dice coefficient showed its potential for the early recognition of (potentially) active molecules.

中文翻译:

用于相似性搜索的差分多分子指纹——在虚拟筛选中利用活性和非活性化合物组

在传统的指纹方法中,两个分子之间的相似性是使用 Tanimoto 指数作为数值标准来计算的。因此,虚拟筛选中的查询分子应该最能代表手头的所需化合物类别。在这里介绍的概念中,所有可用的活性分子形成一个多分子指纹,其中出现的特征根据它们各自的频率进行加权。同样处理非活性分子的特征,并从活性分子的特征值中减去结果值。因此,获得的微分多分子指纹 (DMMFP) 对相应类别的化合物具有特异性。为了解释该指纹中的非整数表示,使用修改后的 Sørensen-Dice 系数来计算相似度。潜在的活性分子产生正分数,而可能不活跃的分子用负值表示。该概念适用于血管紧张素转换酶 (ACE) 抑制剂、β2-肾上腺素受体配体、白三烯 A4 水解酶抑制剂、多巴胺 D3 拮抗剂和细胞色素 CYP2C9 底物,它们的实验结合亲和力是已知的,并针对来自 DUD-E 和另一个背景数据库,由来自暗化学物质的分子组成,其中包括在多种分析中出现频繁击球的化合物。使用 MACCS 指纹的 166 个公开可用密钥和更大的 PubChem 指纹,以非常高的灵敏度恢复活性物质。此外,在暗化学物质数据集中检测到三种已上市的 ACE 抑制剂以及碳酸酐酶 II 抑制剂多佐胺。为了进行比较,DMMFP 也与贝叶斯分类器一起使用,其特异性(正确分类的非活动物)和同样的准确性更高。相反,由 Sørensen-Dice 系数产生的相似性得分显示了其在早期识别(潜在)活性分子方面的潜力。
更新日期:2022-05-25
down
wechat
bug