Spatially Correlated Patterns in Adversarial Images,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Spatially Correlated Patterns in Adversarial Images
arXiv - CS - Artificial Intelligence Pub Date : 2020-11-21 , DOI: arxiv-2011.10794
Nandish Chattopadhyay, Lionell Yip En Zhi, Bryan Tan Bing Xing, Anupam Chattopadhyay

Adversarial attacks have proved to be the major impediment in the progress on research towards reliable machine learning solutions. Carefully crafted perturbations, imperceptible to human vision, can be added to images to force misclassification by an otherwise high performing neural network. To have a better understanding of the key contributors of such structured attacks, we searched for and studied spatially co-located patterns in the distribution of pixels in the input space. In this paper, we propose a framework for segregating and isolating regions within an input image which are particularly critical towards either classification (during inference), or adversarial vulnerability or both. We assert that during inference, the trained model looks at a specific region in the image, which we call Region of Importance (RoI); and the attacker looks at a region to alter/modify, which we call Region of Attack (RoA). The success of this approach could also be used to design a post-hoc adversarial defence method, as illustrated by our observations. This uses the notion of blocking out (we call neutralizing) that region of the image which is highly vulnerable to adversarial attacks but is not important for the task of classification. We establish the theoretical setup for formalising the process of segregation, isolation and neutralization and substantiate it through empirical analysis on standard benchmarking datasets. The findings strongly indicate that mapping features into the input space preserves the significant patterns typically observed in the feature-space while adding major interpretability and therefore simplifies potential defensive mechanisms.

中文翻译：

对抗图像中的空间相关模式

事实证明，对抗性攻击是朝着可靠的机器学习解决方案发展的主要障碍。可以将人类视觉无法察觉的精心制作的扰动添加到图像中，以通过其他高性能的神经网络强制分类错误。为了更好地了解这种结构化攻击的关键因素，我们在输入空间中的像素分布中搜索并研究了空间共置模式。在本文中，我们提出了一个框架，用于隔离和隔离输入图像中的区域，这些区域对于分类（推理中）或对抗性脆弱性或两者均至关重要。我们断言，在推理过程中，训练后的模型将查看图像中的特定区域，我们将其称为重要区域（RoI）；攻击者查看要更改的区域，我们称其为“攻击区域（RoA）”。正如我们的观察所示，这种方法的成功也可以用于设计事后对抗防御方法。这使用了屏蔽（我们称为中和）图像区域的概念，该区域极易受到对抗攻击，但对分类任务并不重要。我们建立了规范化隔离，隔离和中和过程的理论设置，并通过对标准基准数据集进行了实证分析来证实这一点。这些发现强烈表明，将特征映射到输入空间中可以保留通常在特征空间中观察到的重要模式，同时增加了主要的可解释性，因此简化了潜在的防御机制。

更新日期：2020-11-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文