当前位置: X-MOL 学术arXiv.cs.CY › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
arXiv - CS - Computers and Society Pub Date : 2020-11-20 , DOI: arxiv-2011.10369
Fanchao Qi, Yangyi Chen, Mukai Li, Zhiyuan Liu, Maosong Sun

Backdoor attacks, which are a kind of emergent training-time threat to deep neural networks (DNNS). They can manipulate the output of DNNs and posses high insidiousness. In the field of natural language processing, some attack methods have been proposed and achieve very high attack success rates on multiple popular models. Nevertheless, the studies on defending textual backdoor defense are little conducted. In this paper, we propose a simple and effective textual backdoor defense named ONION, which is based on outlier word detection and might be the first method that can handle all the attack situations. Experiments demonstrate the effectiveness of our model when blocking two latest backdoor attack methods.

中文翻译:

洋葱:针对文本后门攻击的简单有效防御

后门攻击,这是对深度神经网络(DNNS)的一种紧急训练时间威胁。他们可以操纵DNN的输出并具有很高的隐蔽性。在自然语言处理领域,已经提出了一些攻击方法,并在多种流行模型上实现了很高的攻击成功率。然而,关于文本后门防御的防御研究很少。在本文中,我们提出了一种简单有效的文本后门防御,即ONION,它基于离群词检测,可能是能够处理所有攻击情况的第一种方法。实验证明了我们的模型在阻止两种最新的后门攻击方法时的有效性。
更新日期:2020-11-23
down
wechat
bug