当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving Uncertainty Estimates through the Relationship with Adversarial Robustness
arXiv - CS - Machine Learning Pub Date : 2020-06-29 , DOI: arxiv-2006.16375
Yao Qin, Xuezhi Wang, Alex Beutel, Ed H. Chi

Robustness issues arise in a variety of forms and are studied through multiple lenses in the machine learning literature. Neural networks lack adversarial robustness -- they are vulnerable to adversarial examples that through small perturbations to inputs cause incorrect predictions. Further, trust is undermined when models give miscalibrated or unstable uncertainty estimates, i.e. the predicted probability is not a good indicator of how much we should trust our model and could vary greatly over multiple independent runs. In this paper, we study the connection between adversarial robustness, predictive uncertainty (calibration) and model uncertainty (stability) on multiple classification networks and datasets. We find that the inputs for which the model is sensitive to small perturbations (are easily attacked) are more likely to have poorly calibrated and unstable predictions. Based on this insight, we examine if calibration and stability can be improved by addressing those adversarially unrobust inputs. To this end, we propose Adversarial Robustness based Adaptive Label Smoothing (AR-AdaLS) that integrates the correlations of adversarial robustness and uncertainty into training by adaptively softening labels conditioned on how easily it can be attacked by adversarial examples. We find that our method, taking the adversarial robustness of the in-distribution data into consideration, leads to better calibration and stability over the model even under distributional shifts. In addition, AR-AdaLS can also be applied to an ensemble model to achieve the best calibration performance.

中文翻译:

通过与对抗性鲁棒性的关系改进不确定性估计

鲁棒性问题以多种形式出现,并通过机器学习文献中的多个镜头进行研究。神经网络缺乏对抗性鲁棒性——它们容易受到对抗性示例的影响,这些示例通过对输入的小扰动导致错误预测。此外,当模型给出错误校准或不稳定的不确定性估计时,信任就会受到破坏,即预测概率不是我们应该信任模型的良好指标,并且在多次独立运行中可能会有很大差异。在本文中,我们研究了多个分类网络和数据集上的对抗性稳健性、预测不确定性(校准)和模型不确定性(稳定性)之间的联系。我们发现模型对小扰动敏感(容易受到攻击)的输入更有可能具有校准不良和不稳定的预测。基于这一见解,我们检查是否可以通过解决那些对抗性不稳健的输入来提高校准和稳定性。为此,我们提出了基于对抗性鲁棒性的自适应标签平滑 (AR-AdaLS),它通过自适应地软化标签,将对抗性鲁棒性和不确定性的相关性整合到训练中,条件是标签很容易受到对抗性示例的攻击。我们发现,我们的方法,考虑到分布内数据的对抗性鲁棒性,即使在分布变化的情况下,也能比模型更好地校准和稳定性。此外,
更新日期:2020-07-01
down
wechat
bug