当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A visual approach for analysis and inference of molecular activity spaces
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2019-10-22 , DOI: 10.1186/s13321-019-0386-z
Samina Kausar 1, 2 , Andre O Falcao 1, 2
Affiliation  

Molecular space visualization can help to explore the diversity of large heterogeneous chemical data, which ultimately may increase the understanding of structure-activity relationships (SAR) in drug discovery projects. Visual SAR analysis can therefore be useful for library design, chemical classification for their biological evaluation and virtual screening for the selection of compounds for synthesis or in vitro testing. As such, computational approaches for molecular space visualization have become an important issue in cheminformatics research. The proposed approach uses molecular similarity as the sole input for computing a probabilistic surface of molecular activity (PSMA). This similarity matrix is transformed in 2D using different dimension reduction algorithms (Principal Coordinates Analysis ( PCooA), Kruskal multidimensional scaling, Sammon mapping and t-SNE). From this projection, a kernel density function is applied to compute the probability of activity for each coordinate in the new projected space. This methodology was tested over four different quantitative structure-activity relationship (QSAR) binary classification data sets and the PSMAs were computed for each. The generated maps showed internal consistency with active molecules grouped together for all data sets and all dimensionality reduction algorithms. To validate the quality of the generated maps, the 2D coordinates of test molecules were computed into the new reference space using a data transformation matrix. In total sixteen PSMAs were built, and their performance was assessed using the Area Under Curve (AUC) and the Matthews Coefficient Correlation (MCC). For the best projections for each data set, AUC testing results ranged from 0.87 to 0.98 and the MCC scores ranged from 0.33 to 0.77, suggesting this methodology can validly capture the complexities of the molecular activity space. All four mapping functions provided generally good results yet the overall performance of PCooA and t-SNE was slightly better than Sammon mapping and Kruskal multidimensional scaling. Our result showed that by using an appropriate combination of metric space representation and dimensionality reduction applied over metric spaces it is possible to produce a visual PSMA for which its consistency has been validated by using this map as a classification model. The produced maps can be used as prediction tools as it is simple to project any molecule into this new reference space as long as the similarities to the molecules used to compute the initial similarity matrix can be computed.

中文翻译:

分子活动空间分析和推理的可视化方法

分子空间可视化可以帮助探索大型异质化学数据的多样性,最终可能增加对药物发现项目中构效关系(SAR)的理解。因此,视觉 SAR 分析可用于文库设计、生物评估的化学分类以及合成或体外测试化合物选择的虚拟筛选。因此,分子空间可视化的计算方法已成为化学信息学研究的一个重要问题。所提出的方法使用分子相似性作为计算分子活性概率表面(PSMA)的唯一输入。使用不同的降维算法(主坐标分析 (PCooA)、Kruskal 多维缩放、Sammon 映射和 t-SNE)将该相似性矩阵转换为 2D。根据该投影,应用核密度函数来计算新投影空间中每个坐标的活动概率。该方法在四个不同的定量构效关系 (QSAR) 二元分类数据集上进行了测试,并计算了每个数据集的 PSMA。生成的图显示出与所有数据集和所有降维算法分组在一起的活性分子的内部一致性。为了验证生成的图的质量,使用数据转换矩阵将测试分子的二维坐标计算到新的参考空间中。总共构建了 16 个 PSMA,并使用曲线下面积 (AUC) 和马修斯系数相关性 (MCC) 评估它们的性能。对于每个数据集的最佳预测,AUC 测试结果范围为 0.87 至 0.98,MCC 分数范围为 0.33 至 0.77,表明该方法可以有效捕获分子活动空间的复杂性。所有四种映射函数都提供了总体良好的结果,但 PCooA 和 t-SNE 的整体性能略好于 Sammon 映射和 Kruskal 多维缩放。我们的结果表明,通过使用度量空间表示和应用于度量空间的降维的适当组合,可以生成视觉 PSMA,并通过使用该图作为分类模型来验证其一致性。生成的图可以用作预测工具,因为只要可以计算与用于计算初始相似性矩阵的分子的相似性,就可以简单地将任何分子投影到这个新的参考空间中。
更新日期:2019-10-22
down
wechat
bug