当前位置:
X-MOL 学术
›
arXiv.cs.AI
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Spatially Correlated Patterns in Adversarial Images
arXiv - CS - Artificial Intelligence Pub Date : 2020-11-21 , DOI: arxiv-2011.10794 Nandish Chattopadhyay, Lionell Yip En Zhi, Bryan Tan Bing Xing, Anupam Chattopadhyay
arXiv - CS - Artificial Intelligence Pub Date : 2020-11-21 , DOI: arxiv-2011.10794 Nandish Chattopadhyay, Lionell Yip En Zhi, Bryan Tan Bing Xing, Anupam Chattopadhyay
Adversarial attacks have proved to be the major impediment in the progress on
research towards reliable machine learning solutions. Carefully crafted
perturbations, imperceptible to human vision, can be added to images to force
misclassification by an otherwise high performing neural network. To have a
better understanding of the key contributors of such structured attacks, we
searched for and studied spatially co-located patterns in the distribution of
pixels in the input space. In this paper, we propose a framework for
segregating and isolating regions within an input image which are particularly
critical towards either classification (during inference), or adversarial
vulnerability or both. We assert that during inference, the trained model looks
at a specific region in the image, which we call Region of Importance (RoI);
and the attacker looks at a region to alter/modify, which we call Region of
Attack (RoA). The success of this approach could also be used to design a
post-hoc adversarial defence method, as illustrated by our observations. This
uses the notion of blocking out (we call neutralizing) that region of the image
which is highly vulnerable to adversarial attacks but is not important for the
task of classification. We establish the theoretical setup for formalising the
process of segregation, isolation and neutralization and substantiate it
through empirical analysis on standard benchmarking datasets. The findings
strongly indicate that mapping features into the input space preserves the
significant patterns typically observed in the feature-space while adding major
interpretability and therefore simplifies potential defensive mechanisms.
中文翻译:
对抗图像中的空间相关模式
事实证明,对抗性攻击是朝着可靠的机器学习解决方案发展的主要障碍。可以将人类视觉无法察觉的精心制作的扰动添加到图像中,以通过其他高性能的神经网络强制分类错误。为了更好地了解这种结构化攻击的关键因素,我们在输入空间中的像素分布中搜索并研究了空间共置模式。在本文中,我们提出了一个框架,用于隔离和隔离输入图像中的区域,这些区域对于分类(推理中)或对抗性脆弱性或两者均至关重要。我们断言,在推理过程中,训练后的模型将查看图像中的特定区域,我们将其称为重要区域(RoI);攻击者查看要更改的区域,我们称其为“攻击区域(RoA)”。正如我们的观察所示,这种方法的成功也可以用于设计事后对抗防御方法。这使用了屏蔽(我们称为中和)图像区域的概念,该区域极易受到对抗攻击,但对分类任务并不重要。我们建立了规范化隔离,隔离和中和过程的理论设置,并通过对标准基准数据集进行了实证分析来证实这一点。这些发现强烈表明,将特征映射到输入空间中可以保留通常在特征空间中观察到的重要模式,同时增加了主要的可解释性,因此简化了潜在的防御机制。
更新日期:2020-11-25
中文翻译:
对抗图像中的空间相关模式
事实证明,对抗性攻击是朝着可靠的机器学习解决方案发展的主要障碍。可以将人类视觉无法察觉的精心制作的扰动添加到图像中,以通过其他高性能的神经网络强制分类错误。为了更好地了解这种结构化攻击的关键因素,我们在输入空间中的像素分布中搜索并研究了空间共置模式。在本文中,我们提出了一个框架,用于隔离和隔离输入图像中的区域,这些区域对于分类(推理中)或对抗性脆弱性或两者均至关重要。我们断言,在推理过程中,训练后的模型将查看图像中的特定区域,我们将其称为重要区域(RoI);攻击者查看要更改的区域,我们称其为“攻击区域(RoA)”。正如我们的观察所示,这种方法的成功也可以用于设计事后对抗防御方法。这使用了屏蔽(我们称为中和)图像区域的概念,该区域极易受到对抗攻击,但对分类任务并不重要。我们建立了规范化隔离,隔离和中和过程的理论设置,并通过对标准基准数据集进行了实证分析来证实这一点。这些发现强烈表明,将特征映射到输入空间中可以保留通常在特征空间中观察到的重要模式,同时增加了主要的可解释性,因此简化了潜在的防御机制。