当前位置: X-MOL 学术arXiv.cs.CL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations
arXiv - CS - Computation and Language Pub Date : 2020-09-19 , DOI: arxiv-2009.09192
Yuan Zang, Bairu Hou, Fanchao Qi, Zhiyuan Liu, Xiaojun Meng, Maosong Sun

Adversarial attacking aims to fool deep neural networks with adversarial examples. In the field of natural language processing, various textual adversarial attack models have been proposed, varying in the accessibility to the victim model. Among them, the attack models that only require the output of the victim model are more fit for real-world situations of adversarial attacking. However, to achieve high attack performance, these models usually need to query the victim model too many times, which is neither efficient nor viable in practice. To tackle this problem, we propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently. In experiments, we evaluate our model by attacking several state-of-the-art models on the benchmark datasets of multiple tasks including sentiment analysis, text classification and natural language inference. Experimental results demonstrate that our model consistently achieves both better attack performance and higher efficiency than recently proposed baseline methods. We also find our attack model can bring more robustness improvement to the victim model by adversarial training. All the code and data of this paper will be made public.

中文翻译:

学习攻击:现实世界中的文本对抗攻击

对抗性攻击旨在用对抗性示例欺骗深度神经网络。在自然语言处理领域,已经提出了各种文本对抗性攻击模型,它们对受害者模型的可访问性各不相同。其中,只需要受害者模型输出的攻击模型更适合对抗性攻击的现实情况。然而,为了获得较高的攻击性能,这些模型通常需要多次查询受害者模型,这在实践中既不高效也不可行。为了解决这个问题,我们提出了一种基于强化学习的攻击模型,它可以从攻击历史中学习并更有效地发起攻击。在实验中,我们通过在包括情感分析在内的多个任务的基准数据集上攻击几个最先进的模型来评估我们的模型,文本分类和自然语言推理。实验结果表明,与最近提出的基线方法相比,我们的模型始终如一地实现了更好的攻击性能和更高的效率。我们还发现我们的攻击模型可以通过对抗训练为受害者模型带来更多的鲁棒性改进。本文的所有代码和数据都将公开。
更新日期:2020-09-22
down
wechat
bug