当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Better constraints of imperceptibility, better adversarial examples in the text
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2021-09-30 , DOI: 10.1002/int.22696
Wenqi Wang 1 , Lina Wang 1 , Run Wang 1 , Aoshuang Ye 1 , Jianpeng Ke 1
Affiliation  

State-of-the-art adversarial attacks in the text domain have shown their power to induce machine learning models to produce abnormal outputs. The samples generated in these attacks have three important attributes: attack ability, transferability, and imperceptibility. However, compared with the other two attributes, the imperceptibility of adversarial examples has not been well investigated. Unlike the pixel-level perturbations in images, adversarial perturbations in the text are usually traceable, reflecting changes in characters, words, or sentences. The generation of imperceptible samples in texts is more difficult than in images. Therefore, how to constrain adversarial perturbations added in the text is a crucial step to construct more natural adversarial texts. Unfortunately, recent studies merely select measurements to constrain the added adversarial perturbations, but none of them explain where these measurements are suitable, which one is better, and how they perform in different kinds of adversarial attacks. In this paper, we fill this gap by comparing the performance of these metrics in various attacks. Furthermore, we propose a stricter constraint for word-level attacks to obtain more imperceptible samples. It is also helpful to enhance existing word-level attacks for adversarial training.

中文翻译:

更好的不可感知性约束,文本中更好的对抗性示例

文本域中最先进的对抗性攻击已显示出其诱导机器学习模型产生异常输出的能力。这些攻击中生成的样本具有三个重要属性:攻击能力、可转移性和不可感知性。然而,与其他两个属性相比,对抗样本的不可感知性还没有得到很好的研究。与图像中的像素级扰动不同,文本中的对抗性扰动通常是可追踪的,反映了字符、单词或句子的变化。在文本中生成不易察觉的样本比在图像中更难。因此,如何约束文本中添加的对抗性扰动是构建更自然的对抗性文本的关键步骤。很遗憾,最近的研究只是选择测量来限制额外的对抗性扰动,但没有一个解释这些测量在哪里是合适的,哪一个更好,以及它们在不同类型的对抗性攻击中的表现如何。在本文中,我们通过比较这些指标在各种攻击中的性能来填补这一空白。此外,我们对词级攻击提出了更严格的约束,以获得更多不易察觉的样本。增强现有的单词级攻击以进行对抗性训练也很有帮助。我们对词级攻击提出了更严格的约束,以获得更多不易察觉的样本。增强现有的单词级攻击以进行对抗性训练也很有帮助。我们对词级攻击提出了更严格的约束,以获得更多不易察觉的样本。增强现有的单词级攻击以进行对抗性训练也很有帮助。
更新日期:2021-09-30
down
wechat
bug