当前位置:
X-MOL 学术
›
arXiv.cs.CL
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations
arXiv - CS - Computation and Language Pub Date : 2020-09-19 , DOI: arxiv-2009.09192 Yuan Zang, Bairu Hou, Fanchao Qi, Zhiyuan Liu, Xiaojun Meng, Maosong Sun
arXiv - CS - Computation and Language Pub Date : 2020-09-19 , DOI: arxiv-2009.09192 Yuan Zang, Bairu Hou, Fanchao Qi, Zhiyuan Liu, Xiaojun Meng, Maosong Sun
Adversarial attacking aims to fool deep neural networks with adversarial
examples. In the field of natural language processing, various textual
adversarial attack models have been proposed, varying in the accessibility to
the victim model. Among them, the attack models that only require the output of
the victim model are more fit for real-world situations of adversarial
attacking. However, to achieve high attack performance, these models usually
need to query the victim model too many times, which is neither efficient nor
viable in practice. To tackle this problem, we propose a reinforcement learning
based attack model, which can learn from attack history and launch attacks more
efficiently. In experiments, we evaluate our model by attacking several
state-of-the-art models on the benchmark datasets of multiple tasks including
sentiment analysis, text classification and natural language inference.
Experimental results demonstrate that our model consistently achieves both
better attack performance and higher efficiency than recently proposed baseline
methods. We also find our attack model can bring more robustness improvement to
the victim model by adversarial training. All the code and data of this paper
will be made public.
中文翻译:
学习攻击:现实世界中的文本对抗攻击
对抗性攻击旨在用对抗性示例欺骗深度神经网络。在自然语言处理领域,已经提出了各种文本对抗性攻击模型,它们对受害者模型的可访问性各不相同。其中,只需要受害者模型输出的攻击模型更适合对抗性攻击的现实情况。然而,为了获得较高的攻击性能,这些模型通常需要多次查询受害者模型,这在实践中既不高效也不可行。为了解决这个问题,我们提出了一种基于强化学习的攻击模型,它可以从攻击历史中学习并更有效地发起攻击。在实验中,我们通过在包括情感分析在内的多个任务的基准数据集上攻击几个最先进的模型来评估我们的模型,文本分类和自然语言推理。实验结果表明,与最近提出的基线方法相比,我们的模型始终如一地实现了更好的攻击性能和更高的效率。我们还发现我们的攻击模型可以通过对抗训练为受害者模型带来更多的鲁棒性改进。本文的所有代码和数据都将公开。
更新日期:2020-09-22
中文翻译:
学习攻击:现实世界中的文本对抗攻击
对抗性攻击旨在用对抗性示例欺骗深度神经网络。在自然语言处理领域,已经提出了各种文本对抗性攻击模型,它们对受害者模型的可访问性各不相同。其中,只需要受害者模型输出的攻击模型更适合对抗性攻击的现实情况。然而,为了获得较高的攻击性能,这些模型通常需要多次查询受害者模型,这在实践中既不高效也不可行。为了解决这个问题,我们提出了一种基于强化学习的攻击模型,它可以从攻击历史中学习并更有效地发起攻击。在实验中,我们通过在包括情感分析在内的多个任务的基准数据集上攻击几个最先进的模型来评估我们的模型,文本分类和自然语言推理。实验结果表明,与最近提出的基线方法相比,我们的模型始终如一地实现了更好的攻击性能和更高的效率。我们还发现我们的攻击模型可以通过对抗训练为受害者模型带来更多的鲁棒性改进。本文的所有代码和数据都将公开。