How Precise Are Our Quantitative Structure–Activity Relationship Derived Predictions for New Query Chemicals?,ACS Omega

当前位置： X-MOL 学术 › ACS Omega › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

How Precise Are Our Quantitative Structure–Activity Relationship Derived Predictions for New Query Chemicals?
ACS Omega ( IF 3.7 ) Pub Date : 2018-09-19 00:00:00 , DOI: 10.1021/acsomega.8b01647
Kunal Roy ₁ , Pravin Ambure ₁ , Supratik Kar ₂

Affiliation

Quantitative structure–activity relationship (QSAR) models have long been used for making predictions and data gap filling in diverse fields including medicinal chemistry, predictive toxicology, environmental fate modeling, materials science, agricultural science, nanoscience, food science, and so forth. Usually a QSAR model is developed based on chemical information of a properly designed training set and corresponding experimental response data while the model is validated using one or more test set(s) for which the experimental response data are available. However, it is interesting to estimate the reliability of predictions when the model is applied to a completely new data set (true external set) even when the new data points are within applicability domain (AD) of the developed model. In the present study, we have categorized the quality of predictions for the test set or true external set into three groups (good, moderate, and bad) based on absolute prediction errors. Then, we have used three criteria [(a) mean absolute error of leave-one-out predictions for 10 most close training compounds for each query molecule; (b) AD in terms of similarity based on the standardization approach; and (c) proximity of the predicted value of the query compound to the mean training response] in different weighting schemes for making a composite score of predictions. It was found that using the most frequently appearing weighting scheme 0.5–0–0.5, the composite score-based categorization showed concordance with absolute prediction error-based categorization for more than 80% test data points while working with 5 different datasets with 15 models for each set derived in three different splitting techniques. These observations were also confirmed with true external sets for another four endpoints suggesting applicability of the scheme to judge the reliability of predictions for new datasets. The scheme has been implemented in a tool “Prediction Reliability Indicator” available at http://dtclab.webs.com/software-tools and http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/, and the tool is presently valid for multiple linear regression models only.

中文翻译：

我们对新查询化学品的定量结构-活性关系的预测有多精确？

长期以来，定量结构-活性关系（QSAR）模型一直用于预测和填补数据包括化学，预测毒理学，环境归宿建模，材料科学，农业科学，纳米科学，食品科学等领域的数据缺口。通常，基于适当设计的训练集的化学信息和相应的实验响应数据来开发QSAR模型，同时使用一个或多个可获得实验响应数据的测试集对模型进行验证。但是，将模型应用于全新的数据集（真正的外部集）时，即使新数据点位于已开发模型的适用范围（AD）内，估计预测的可靠性也很有趣。在目前的研究中，我们根据绝对预测误差将测试集或真实外部集的预测质量分为三组（好，中和差）。然后，我们使用了三个标准[（a）每个查询分子的10个最接近训练化合物的留一法预测的平均绝对误差；（b）基于标准化方法的相似性方面的反倾销；（c）查询化合物的预测值与平均训练响应的接近度]在不同的加权方案中进行预测的综合评分。发现使用最频繁出现的加权方案0.5–0–0.5，基于综合评分的分类显示了80％以上的测试数据点与基于绝对预测误差的分类相一致，同时使用了5种不同的数据集，每种数据集都有15种模型，分别来自三种不同的拆分技术。这些观察结果也得到了另外四个端点的真实外部集的证实，这表明该方案可用于判断新数据集预测的可靠性。该方案已在http://dtclab.webs.com/software-tools和http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/上提供的“预测可靠性指标”工具中实施，该工具为目前仅对多个线性回归模型有效。这些观察结果也得到了另外四个端点的真实外部集的证实，这表明该方案可用于判断新数据集预测的可靠性。该方案已在http://dtclab.webs.com/software-tools和http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/上提供的“预测可靠性指标”工具中实施，该工具为目前仅对多个线性回归模型有效。这些观察结果也得到了另外四个端点的真实外部集的证实，这表明该方案可用于判断新数据集预测的可靠性。该方案已在http://dtclab.webs.com/software-tools和http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/上提供的“预测可靠性指标”工具中实施，该工具为目前仅对多个线性回归模型有效。

更新日期：2018-09-19

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11