当前位置: X-MOL 学术J. Chem. Inf. Model. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Conformal Regression for Quantitative Structure–Activity Relationship Modeling—Quantifying Prediction Uncertainty
Journal of Chemical Information and Modeling ( IF 5.6 ) Pub Date : 2018-04-27 00:00:00 , DOI: 10.1021/acs.jcim.8b00054
Fredrik Svensson 1, 2 , Natalia Aniceto 1 , Ulf Norinder 3, 4 , Isidro Cortes-Ciriano 1 , Ola Spjuth 5 , Lars Carlsson 6, 7 , Andreas Bender 1
Affiliation  

Making predictions with an associated confidence is highly desirable as it facilitates decision making and resource prioritization. Conformal regression is a machine learning framework that allows the user to define the required confidence and delivers predictions that are guaranteed to be correct to the selected extent. In this study, we apply conformal regression to model molecular properties and bioactivity values and investigate different ways to scale the resultant prediction intervals to create as efficient (i.e., narrow) regressors as possible. Different algorithms to estimate the prediction uncertainty were used to normalize the prediction ranges, and the different approaches were evaluated on 29 publicly available data sets. Our results show that the most efficient conformal regressors are obtained when using the natural exponential of the ensemble standard deviation from the underlying random forest to scale the prediction intervals, but other approaches were almost as efficient. This approach afforded an average prediction range of 1.65 pIC50 units at the 80% confidence level when applied to bioactivity modeling. The choice of nonconformity function has a pronounced impact on the average prediction range with a difference of close to one log unit in bioactivity between the tightest and widest prediction range. Overall, conformal regression is a robust approach to generate bioactivity predictions with associated confidence.

中文翻译:

定量回归的共形回归-活动关系建模-量化预测不确定性

具有相关置信度的预测非常可取,因为它有助于决策和资源优先级排序。保形回归是一种机器学习框架,它使用户可以定义所需的置信度并提供可以保证在所选范围内正确的预测。在这项研究中,我们将保形回归应用于分子特性和生物活性值的建模,并研究了不同的方法来缩放结果预测间隔,以创建尽可能有效的(即较窄的)回归变量。使用不同的算法来估计预测不确定性,以对预测范围进行归一化,并对29种可公开获得的数据集评估了不同的方法。我们的结果表明,当使用来自潜在随机森林的整体标准差的自然指数来缩放预测间隔时,可以获得最有效的保形回归,但是其他方法几乎同样有效。当应用于生物活性建模时,该方法在80%置信水平下可提供1.65 pIC50单位的平均预测范围。不整合函数的选择对平均预测范围有显着影响,在最严格和最宽的预测范围之间的生物活性差异接近一个对数单位。总体而言,共形回归是一种产生具有相关置信度的生物活性预测的可靠方法。当应用于生物活性建模时,该方法在80%置信水平下可提供1.65 pIC50单位的平均预测范围。不整合函数的选择对平均预测范围有显着影响,在最严格和最宽的预测范围之间的生物活性差异接近一个对数单位。总体而言,共形回归是一种产生具有相关置信度的生物活性预测的可靠方法。当应用于生物活性建模时,该方法在80%置信水平下可提供1.65 pIC50单位的平均预测范围。不整合函数的选择对平均预测范围有显着影响,在最严格和最宽的预测范围之间的生物活性差异接近一个对数单位。总体而言,共形回归是一种产生具有相关置信度的生物活性预测的可靠方法。
更新日期:2018-04-27
down
wechat
bug