Adversarial robustness via stochastic regularization of neural activation sensitivity,arXiv - CS - Neural and Evolutionary Computing

当前位置： X-MOL 学术 › arXiv.cs.NE › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Adversarial robustness via stochastic regularization of neural activation sensitivity
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-23 , DOI: arxiv-2009.11349
Gil Fidel, Ron Bitton, Ziv Katzir, Asaf Shabtai

Recent works have shown that the input domain of any machine learning classifier is bound to contain adversarial examples. Thus we can no longer hope to immune classifiers against adversarial examples and instead can only aim to achieve the following two defense goals: 1) making adversarial examples harder to find, or 2) weakening their adversarial nature by pushing them further away from correctly classified data points. Most if not all the previously suggested defense mechanisms attend to just one of those two goals, and as such, could be bypassed by adaptive attacks that take the defense mechanism into consideration. In this work we suggest a novel defense mechanism that simultaneously addresses both defense goals: We flatten the gradients of the loss surface, making adversarial examples harder to find, using a novel stochastic regularization term that explicitly decreases the sensitivity of individual neurons to small input perturbations. In addition, we push the decision boundary away from correctly classified inputs by leveraging Jacobian regularization. We present a solid theoretical basis and an empirical testing of our suggested approach, demonstrate its superiority over previously suggested defense mechanisms, and show that it is effective against a wide range of adaptive attacks.

中文翻译：

通过神经激活敏感性的随机正则化的对抗鲁棒性

最近的工作表明，任何机器学习分类器的输入域都必然包含对抗性示例。因此，我们不能再希望针对对抗样本免疫分类器，而只能致力于实现以下两个防御目标：1）使对抗样本更难找到，或 2）通过将它们推离正确分类的数据来削弱它们的对抗性点。大多数（如果不是全部）先前建议的防御机制仅涉及这两个目标之一，因此，考虑到防御机制的自适应攻击可以绕过。在这项工作中，我们提出了一种新颖的防御机制，可以同时解决两个防御目标：我们使损失面的梯度变平，使对抗样本更难找到，使用一种新颖的随机正则化项，显着降低单个神经元对小输入扰动的敏感性。此外，我们通过利用雅可比正则化将决策边界推离正确分类的输入。我们提出了一个坚实的理论基础和我们建议的方法的实证测试，证明了它优于先前建议的防御机制，并表明它对广泛的自适应攻击有效。

更新日期：2020-09-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文