当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2021-09-20 , DOI: 10.1186/s13321-021-00551-x
Dingyan Wang 1, 2, 3 , Jie Yu 2, 3 , Lifan Chen 2, 3 , Xutong Li 2, 3 , Hualiang Jiang 2, 3 , Kaixian Chen 2, 3 , Mingyue Zheng 2, 3 , Xiaomin Luo 1, 2, 3
Affiliation  

Reliable uncertainty quantification for statistical models is crucial in various downstream applications, especially for drug design and discovery where mistakes may incur a large amount of cost. This topic has therefore absorbed much attention and a plethora of methods have been proposed over the past years. The approaches that have been reported so far can be mainly categorized into two classes: distance-based approaches and Bayesian approaches. Although these methods have been widely used in many scenarios and shown promising performance with their distinct superiorities, being overconfident on out-of-distribution examples still poses challenges for the deployment of these techniques in real-world applications. In this study we investigated a number of consensus strategies in order to combine both distance-based and Bayesian approaches together with post-hoc calibration for improved uncertainty quantification in QSAR (Quantitative Structure–Activity Relationship) regression modeling. We employed a set of criteria to quantitatively assess the ranking and calibration ability of these models. Experiments based on 24 bioactivity datasets were designed to make critical comparison between the model we proposed and other well-studied baseline models. Our findings indicate that the hybrid framework proposed by us can robustly enhance the model ability of ranking absolute errors. Together with post-hoc calibration on the validation set, we show that well-calibrated uncertainty quantification results can be obtained in domain shift settings. The complementarity between different methods is also conceptually analyzed.

中文翻译:


用于改进基于深度学习的 QSAR 回归模型中的不确定性量化的混合框架



统计模型的可靠不确定性量化在各种下游应用中至关重要,特别是对于药物设计和发现而言,错误可能会产生大量成本。因此,这个话题引起了人们的广泛关注,并且在过去几年中提出了很多方法。迄今为止报道的方法主要分为两类:基于距离的方法和贝叶斯方法。尽管这些方法已在许多场景中得到广泛使用,并以其独特的优势显示出良好的性能,但对分布外示例的过度自信仍然对这些技术在实际应用中的部署构成了挑战。在本研究中,我们研究了多种共识策略,以便将基于距离的方法和贝叶斯方法与事后校准结合起来,以改进 QSAR(定量结构-活动关系)回归模型中的不确定性量化。我们采用了一组标准来定量评估这些模型的排名和校准能力。基于 24 个生物活性数据集的实验旨在对我们提出的模型与其他经过充分研究的基线模型进行关键比较。我们的研究结果表明,我们提出的混合框架可以稳健地增强模型对绝对误差进行排名的能力。结合验证集的事后校准,我们表明可以在域转移设置中获得经过良好校准的不确定性量化结果。还从概念上分析了不同方法之间的互补性。
更新日期:2021-09-20
down
wechat
bug