当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Statistical-based database fingerprint: chemical space dependent representation of compound databases.
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2018-11-22 , DOI: 10.1186/s13321-018-0311-x
Norberto Sánchez-Cruz 1 , José L Medina-Franco 1
Affiliation  

Simplified representation of compound databases has several applications in cheminformatics. Herein, we introduce an alternative and general method to build single fingerprint representations of compound databases. The approach is inspired on the previously published modal fingerprints that are aimed to capture the most significant bits of a fingerprint representation for a compound data set. The novelty of the herein proposed statistical-based database fingerprint (SB-DFP) is that it is generated based on binomial proportions comparisons taking as reference the distribution of “1” bits on a large representative set of the chemical space. To illustrate the Method, SB-DFPs were constructed for 28 epigenetic target data sets retrieved from a recently published epigenomics database of interest in probe and drug discovery. For each target data set, the SB-DFPs were built based on two representative fingerprints of different design using as reference a data set with more than 15 million compounds from ZINC. The application of SB-DFP was illustrated and compared to other methods through association relationships of the 28 epigenetic data sets and similarity searching. It was found that SB-DFPs captured overall, the common features between data sets and the distinct features of each set. In similarity searching SB-DFP equaled or outperformed other approaches for at least 20 out of the 28 sets. SB-DFP is a general approach based on binomial proportion comparisons to represent a compound data set with a single fingerprint. SB-DFP can be developed, at least in principle, based on any fingerprint and reference data set. SB-DFP is a good alternative for exploration of relationships between targets through its associated compound data sets and performing similarity searching.

中文翻译:

基于统计的数据库指纹:化合物数据库的化学空间相关表示。

化合物数据库的简化表示法在化学信息学中有多种应用。在这里,我们介绍了一种替代的通用方法来构建化合物数据库的单个指纹表示。该方法的灵感来自先前发布的模式指纹,这些模式指纹旨在捕获复合数据集的指纹表示的最高有效位。本文提出的基于统计的数据库指纹(SB-DFP)的新颖之处在于,它是基于二项式比例比较生成的,该比较以化学空间的大型代表集上的“ 1”位的分布为参考。为了说明该方法,针对从最近发表的涉及探针和药物发现的表观基因组学数据库中检索到的28个表观遗传学目标数据集构建了SB-DFP。对于每个目标数据集,SB-DFP是基于两个不同设计的代表性指纹而构建的,其中使用了来自ZINC的1500万种化合物的数据集作为参考。通过对28个表观遗传数据集的关联关系和相似性搜索,说明了SB-DFP的应用并将其与其他方法进行了比较。发现SB-DFP捕获了总体,数据集之间的共同特征以及每个集合的独特特征。在相似性搜索中,SB-DFP在28组中至少有20组等于或优于其他方法。SB-DFP是一种基于二项式比例比较的通用方法,用于表示具有单个指纹的复合数据集。至少原则上可以基于任何指纹和参考数据集来开发SB-DFP。
更新日期:2018-11-22
down
wechat
bug