当前位置: X-MOL 学术arXiv.cs.NE › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
What it Thinks is Important is Important: Robustness Transfers through Input Gradients
arXiv - CS - Neural and Evolutionary Computing Pub Date : 2019-12-11 , DOI: arxiv-1912.05699
Alvin Chan, Yi Tay and Yew-Soon Ong

Adversarial perturbations are imperceptible changes to input pixels that can change the prediction of deep learning models. Learned weights of models robust to such perturbations are previously found to be transferable across different tasks but this applies only if the model architecture for the source and target tasks is the same. Input gradients characterize how small changes at each input pixel affect the model output. Using only natural images, we show here that training a student model's input gradients to match those of a robust teacher model can gain robustness close to a strong baseline that is robustly trained from scratch. Through experiments in MNIST, CIFAR-10, CIFAR-100 and Tiny-ImageNet, we show that our proposed method, input gradient adversarial matching, can transfer robustness across different tasks and even across different model architectures. This demonstrates that directly targeting the semantics of input gradients is a feasible way towards adversarial robustness.

中文翻译:

它认为重要的是重要的:鲁棒性通过输入梯度转移

对抗性扰动是对输入像素的难以察觉的变化,可以改变深度学习模型的预测。先前发现对此类扰动鲁棒的模型的学习权重可跨不同任务转移,但这仅适用于源任务和目标任务的模型架构相同的情况。输入梯度表征每个输入像素的微小变化如何影响模型输出。仅使用自然图像,我们在这里展示了训练学生模型的输入梯度以匹配稳健教师模型的输入梯度可以获得接近从头开始稳健训练的强基线的稳健性。通过在 MNIST、CIFAR-10、CIFAR-100 和 Tiny-ImageNet 中的实验,我们表明我们提出的方法,输入梯度对抗匹配,可以跨不同任务甚至跨不同模型架构传递健壮性。这表明直接针对输入梯度的语义是实现对抗性鲁棒性的可行方法。
更新日期:2020-10-30
down
wechat
bug