当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Experimental Error, Kurtosis, Activity Cliffs, and Methodology: What Limits the Predictivity of Quantitative Structure-Activity Relationship Models?
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2020-03-24 , DOI: 10.1021/acs.jcim.9b01067
Robert P Sheridan 1 , Prabha Karnachi 1 , Matthew Tudor 2 , Yuting Xu 3 , Andy Liaw 3 , Falgun Shah 2 , Alan C Cheng 4 , Elizabeth Joshi 5 , Meir Glick 6 , Juan Alvarez 6
Affiliation  

Given a particular descriptor/method combination, some quantitative structure-activity relationship (QSAR) datasets are very predictive by random-split cross-validation while others are not. Recent literature in modelability suggests that the limiting issue for predictivity is in the data, not the QSAR methodology, and the limits are due to activity cliffs. Here, we investigate, on in-house data, the relative usefulness of experimental error, distribution of the activities, and activity cliff metrics in determining how predictive a dataset is likely to be. We include unmodified in-house datasets, datasets that should be perfectly predictive based only on the chemical structure, datasets where the distribution of activities is manipulated, and datasets that include a known amount of added noise. We find that activity cliff metrics determine predictivity better than the other metrics we investigated, whatever the type of dataset, consistent with the modelability literature. However, such metrics cannot distinguish real activity cliffs due to large uncertainties in the activities. We also show that a number of modern QSAR methods, and some alternative descriptors, are equally bad at predicting the activities of compounds on activity cliffs, consistent with the assumptions behind "modelability." Finally, we relate time-split predictivity with random-split predictivity and show that different coverages of chemical space are at least as important as uncertainty in activity and/or activity cliffs in limiting predictivity.

中文翻译:

实验误差,峰度,活动悬崖和方法论:什么限制了定量构效关系模型的预测性?

给定特定的描述符/方法组合,一些定量结构-活性关系(QSAR)数据集通过随机分割交叉验证非常可预测,而其他数据集则不是。关于可建模性的最新文献表明,可预测性的局限性在于数据,而不是QSAR方法,而局限性是由于活动悬崖。在这里,我们根据内部数据调查实验误差,活动分布以及活动悬崖指标在确定数据集的预测可能性方面的相对有用性。我们包括未修改的内部数据集,仅应基于化学结构完全可预测的数据集,可控制活动分布的数据集以及包含已知数量的添加噪声的数据集。我们发现活动悬崖度量标准确定的可预测性比我们研究的其他度量标准好,无论数据集的类型如何,与可建模性文献一致。但是,由于活动中存在很大的不确定性,因此此类指标无法区分实际活动的悬崖。我们还表明,与“可建模性”背后的假设一致,许多现代QSAR方法和一些替代性描述符在预测化合物在活性悬崖上的活性同样不好。最后,我们将时间分割的可预测性与随机分割的可预测性相关联,并表明化学空间的不同覆盖范围至少与活动和/或活动悬崖的不确定性在限制可预测性方面同样重要。与可建模性文献一致。但是,由于活动中存在较大的不确定性,因此此类度量无法区分实际活动的悬崖。我们还表明,与“可建模性”背后的假设一致,许多现代QSAR方法和一些替代性描述符在预测化合物在活性悬崖上的活性同样不好。最后,我们将时间分割的可预测性与随机分割的可预测性相关联,并表明化学空间的不同覆盖范围至少与活动和/或活动悬崖的不确定性在限制可预测性方面同样重要。与可建模性文献一致。但是,由于活动中存在很大的不确定性,因此此类指标无法区分实际活动的悬崖。我们还表明,与“可建模性”背后的假设一致,许多现代QSAR方法和一些替代性描述符在预测化合物在活性悬崖上的活性同样不好。最后,我们将时间分割的可预测性与随机分割的可预测性相关联,并表明化学空间的不同覆盖范围至少与活动和/或活动悬崖的不确定性在限制可预测性方面同样重要。与“可建模性”背后的假设一致,在预测活性悬崖上化合物的活性方面同样很糟糕。最后,我们将时间分割的可预测性与随机分割的可预测性相关联,并显示出不同的化学空间覆盖率至少与活动性和/或活动性悬崖的不确定性在限制预测性方面同样重要。与“可建模性”背后的假设一致,在预测活性悬崖上化合物的活性方面同样很糟糕。最后,我们将时间分割的可预测性与随机分割的可预测性相关联,并表明化学空间的不同覆盖范围至少与活动和/或活动悬崖的不确定性在限制可预测性方面同样重要。
更新日期:2020-03-24
down
wechat
bug