当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generating transferable adversarial examples based on perceptually-aligned perturbation
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2021-01-12 , DOI: 10.1007/s13042-020-01240-1
Hongqiao Chen , Keda Lu , Xianmin Wang , Jin Li

Neural networks (NNs) are known to be susceptible to adversarial examples (AEs), which are intentionally designed to deceive a target classifier by adding small perturbations to the inputs. And interestingly, AEs crafted for one NN can mislead another model. Such a property is referred to as transferability, which is often leveraged to perform attacks in black-box settings. To mitigate the transferability of AEs, many approaches are explored to enhance the NN’s robustness. Especially, adversarial training (AT) and its variants are shown be the strongest defense to resist such transferable AEs. To boost the transferability of AEs against the robust models that have undergone AT, a novel AE generating method is proposed in this paper. The motivation of our method is based on the observation that robust models with AT is more sensitive to the perceptually-relevant gradients, hence it is reasonable to synthesize the AEs by the perturbations that have the perceptually-aligned features. The detailed process of the proposed method is given as below. First, by optimizing the loss function over an ensemble of random noised inputs, we obtain perceptually-aligned perturbations that have the noise-invariant property. Second, we employ Perona–Malik (P–M) filter to smooth the derived adversarial perturbations, such that the perceptually-relevant feature of the perturbation is significantly reinforced and the local oscillation of the perturbation is substantially suppressed. Our method can be generally applied to any gradient-based attack method. We carry out extensive experiments under ImageNet dataset for various robust and non-robust models, and the experimental results demonstrate the effectiveness of our method. Particularly, by combining our method with diverse inputs method and momentum iterative fast gradient sign method, we can achieve state-of-the-art performance in terms of fooling the robust models.



中文翻译:

基于感知对齐的扰动生成可转移的对抗示例

已知神经网络(NN)容易受到对抗性示例(AE)的攻击,这些示例经故意设计为通过在输入中添加小扰动来欺骗目标分类器。有趣的是,为一个NN设计的AE可能会误导另一种模型。这种属性称为可转移性,通常利用它来在黑盒设置中执行攻击。为了减轻AE的可传递性,探索了许多方法来增强NN的鲁棒性。特别是,对抗训练(AT)及其变体被证明是抵抗这种可转移AE的最强防御。为了提高AE相对于经过AT的鲁棒模型的可转移性,本文提出了一种新颖的AE生成方法。我们方法的动机是基于这样的观察:具有AT的鲁棒模型对与感知相关的梯度更敏感,因此,通过具有与感知对齐特征的扰动来合成AE是合理的。该方法的详细过程如下。首先,通过在一组随机噪声输入上优化损失函数,我们获得了具有噪声不变性的感知对准扰动。其次,我们使用Perona-Malik(PM)滤波器对派生的对抗性扰动进行平滑处理,从而显着增强了扰动的感知相关特征,并显着抑制了扰动的局部振荡。我们的方法通常可以应用于任何基于梯度的攻击方法。我们在ImageNet数据集下针对各种鲁棒和非鲁棒模型进行了广泛的实验,实验结果证明了该方法的有效性。特别是,通过将我们的方法与各种输入法和动量迭代快速梯度符号法相结合,就可以使鲁棒模型变得愚蠢。

更新日期:2021-01-12
down
wechat
bug