当前位置:
X-MOL 学术
›
arXiv.cs.NE
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Adversarial robustness via stochastic regularization of neural activation sensitivity
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-23 , DOI: arxiv-2009.11349 Gil Fidel, Ron Bitton, Ziv Katzir, Asaf Shabtai
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2020-09-23 , DOI: arxiv-2009.11349 Gil Fidel, Ron Bitton, Ziv Katzir, Asaf Shabtai
Recent works have shown that the input domain of any machine learning
classifier is bound to contain adversarial examples. Thus we can no longer hope
to immune classifiers against adversarial examples and instead can only aim to
achieve the following two defense goals: 1) making adversarial examples harder
to find, or 2) weakening their adversarial nature by pushing them further away
from correctly classified data points. Most if not all the previously suggested
defense mechanisms attend to just one of those two goals, and as such, could be
bypassed by adaptive attacks that take the defense mechanism into
consideration. In this work we suggest a novel defense mechanism that
simultaneously addresses both defense goals: We flatten the gradients of the
loss surface, making adversarial examples harder to find, using a novel
stochastic regularization term that explicitly decreases the sensitivity of
individual neurons to small input perturbations. In addition, we push the
decision boundary away from correctly classified inputs by leveraging Jacobian
regularization. We present a solid theoretical basis and an empirical testing
of our suggested approach, demonstrate its superiority over previously
suggested defense mechanisms, and show that it is effective against a wide
range of adaptive attacks.
中文翻译:
通过神经激活敏感性的随机正则化的对抗鲁棒性
最近的工作表明,任何机器学习分类器的输入域都必然包含对抗性示例。因此,我们不能再希望针对对抗样本免疫分类器,而只能致力于实现以下两个防御目标:1)使对抗样本更难找到,或 2)通过将它们推离正确分类的数据来削弱它们的对抗性点。大多数(如果不是全部)先前建议的防御机制仅涉及这两个目标之一,因此,考虑到防御机制的自适应攻击可以绕过。在这项工作中,我们提出了一种新颖的防御机制,可以同时解决两个防御目标:我们使损失面的梯度变平,使对抗样本更难找到,使用一种新颖的随机正则化项,显着降低单个神经元对小输入扰动的敏感性。此外,我们通过利用雅可比正则化将决策边界推离正确分类的输入。我们提出了一个坚实的理论基础和我们建议的方法的实证测试,证明了它优于先前建议的防御机制,并表明它对广泛的自适应攻击有效。
更新日期:2020-09-25
中文翻译:
通过神经激活敏感性的随机正则化的对抗鲁棒性
最近的工作表明,任何机器学习分类器的输入域都必然包含对抗性示例。因此,我们不能再希望针对对抗样本免疫分类器,而只能致力于实现以下两个防御目标:1)使对抗样本更难找到,或 2)通过将它们推离正确分类的数据来削弱它们的对抗性点。大多数(如果不是全部)先前建议的防御机制仅涉及这两个目标之一,因此,考虑到防御机制的自适应攻击可以绕过。在这项工作中,我们提出了一种新颖的防御机制,可以同时解决两个防御目标:我们使损失面的梯度变平,使对抗样本更难找到,使用一种新颖的随机正则化项,显着降低单个神经元对小输入扰动的敏感性。此外,我们通过利用雅可比正则化将决策边界推离正确分类的输入。我们提出了一个坚实的理论基础和我们建议的方法的实证测试,证明了它优于先前建议的防御机制,并表明它对广泛的自适应攻击有效。