Generating Natural Language Adversarial Examples on a Large Scale with Generative Models,arXiv - CS - Computation and Language

当前位置： X-MOL 学术 › arXiv.cs.CL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models
arXiv - CS - Computation and Language Pub Date : 2020-03-10 , DOI: arxiv-2003.10388
Yankun Ren and Jianbin Lin and Siliang Tang and Jun Zhou and Shuang Yang and Yuan Qi and Xiang Ren

Today text classification models have been widely used. However, these classifiers are found to be easily fooled by adversarial examples. Fortunately, standard attacking methods generate adversarial texts in a pair-wise way, that is, an adversarial text can only be created from a real-world text by replacing a few words. In many applications, these texts are limited in numbers, therefore their corresponding adversarial examples are often not diverse enough and sometimes hard to read, thus can be easily detected by humans and cannot create chaos at a large scale. In this paper, we propose an end to end solution to efficiently generate adversarial texts from scratch using generative models, which are not restricted to perturbing the given texts. We call it unrestricted adversarial text generation. Specifically, we train a conditional variational autoencoder (VAE) with an additional adversarial loss to guide the generation of adversarial examples. Moreover, to improve the validity of adversarial texts, we utilize discrimators and the training framework of generative adversarial networks (GANs) to make adversarial texts consistent with real data. Experimental results on sentiment analysis demonstrate the scalability and efficiency of our method. It can attack text classification models with a higher success rate than existing methods, and provide acceptable quality for humans in the meantime.

中文翻译：

使用生成模型大规模生成自然语言对抗样本

如今，文本分类模型已被广泛使用。然而，发现这些分类器很容易被对抗样本所愚弄。幸运的是，标准的攻击方法以成对的方式生成对抗性文本，也就是说，对抗性文本只能通过替换几个单词来从现实世界的文本中创建。在许多应用中，这些文本数量有限，因此它们对应的对抗性示例往往不够多样化，有时难以阅读，因此很容易被人类发现，无法大规模制造混乱。在本文中，我们提出了一种端到端的解决方案，使用生成模型从头开始有效地生成对抗性文本，该模型不限于扰乱给定的文本。我们称之为不受限制的对抗性文本生成。具体来说，我们训练了一个带有额外对抗性损失的条件变分自动编码器（VAE）来指导对抗性示例的生成。此外，为了提高对抗文本的有效性，我们利用判别器和生成对抗网络 (GAN) 的训练框架使对抗文本与真实数据保持一致。情感分析的实验结果证明了我们方法的可扩展性和效率。它可以以比现有方法更高的成功率攻击文本分类模型，同时为人类提供可接受的质量。我们利用判别器和生成对抗网络 (GAN) 的训练框架使对抗文本与真实数据保持一致。情感分析的实验结果证明了我们方法的可扩展性和效率。它可以以比现有方法更高的成功率攻击文本分类模型，同时为人类提供可接受的质量。我们利用判别器和生成对抗网络 (GAN) 的训练框架使对抗文本与真实数据保持一致。情感分析的实验结果证明了我们方法的可扩展性和效率。它可以以比现有方法更高的成功率攻击文本分类模型，同时为人类提供可接受的质量。

更新日期：2020-03-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文