Deeply uncertain: comparing methods of uncertainty quantification in deep learning algorithms,Machine Learning: Science and Technology

当前位置： X-MOL 学术 › Mach. Learn. Sci. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deeply uncertain: comparing methods of uncertainty quantification in deep learning algorithms
Machine Learning: Science and Technology ( IF 6.013 ) Pub Date : 2020-12-04 , DOI: 10.1088/2632-2153/aba6f3
Joo Caldeira ₁ , Brian Nord _{1,

2,

3}

Affiliation

We present a comparison of methods for uncertainty quantification (UQ) in deep learning algorithms in the context of a simple physical system. Three of the most common uncertainty quantification methods—Bayesian neural networks (BNNs), concrete dropout (CD), and deep ensembles (DEs) — are compared to the standard analytic error propagation. We discuss this comparison in terms endemic to both machine learning (‘epistemic’ and ‘aleatoric’) and the physical sciences (‘statistical’ and ‘systematic’). The comparisons are presented in terms of simulated experimental measurements of a single pendulum—a prototypical physical system for studying measurement and analysis techniques. Our results highlight some pitfalls that may occur when using these UQ methods. For example, when the variation of noise in the training set is small, all methods predicted the same relative uncertainty independently of the inputs. This issue is particularly hard to avoid in BNN. On the other hand, when the test set contains samples far from the training distribution, we found that no methods sufficiently increased the uncertainties associated to their predictions. This problem was particularly clear for CD. In light of these results, we make some recommendations for usage and interpretation of UQ methods.

中文翻译：

深度不确定性：深度学习算法中不确定性量化的比较方法

我们在一个简单的物理系统的上下文中介绍了深度学习算法中不确定性量化（UQ）方法的比较。将三种最常见的不确定性量化方法（贝叶斯神经网络（BNN），混凝土漏失（CD）和深层合奏（DE））与标准分析误差传播进行了比较。我们用机器学习（“认知的”和“无言的”）和物理科学（“统计的”和“系统的”）的地方性来讨论这种比较。这些比较是根据单个摆的模拟实验测量来表示的，摆是用于研究测量和分析技术的典型物理系统。我们的结果强调了使用这些UQ方法时可能发生的一些陷阱。例如，当训练集中的噪声变化较小时，所有方法都独立于输入预测相同的相对不确定性。在BNN中尤其难以避免此问题。另一方面，当测试集包含远离训练分布的样本时，我们发现没有任何方法可以充分增加与其预测相关的不确定性。对于CD，此问题特别明显。根据这些结果，我们对UQ方法的用法和解释提出一些建议。

更新日期：2020-12-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>