BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning,arXiv - CS - Cryptography and Security

当前位置： X-MOL 学术 › arXiv.cs.CR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
arXiv - CS - Cryptography and Security Pub Date : 2021-08-01 , DOI: arxiv-2108.00352
Jinyuan Jia, Yupei Liu, Neil Zhenqiang Gong

Self-supervised learning in computer vision aims to pre-train an image encoder using a large amount of unlabeled images or (image, text) pairs. The pre-trained image encoder can then be used as a feature extractor to build downstream classifiers for many downstream tasks with a small amount of or no labeled training data. In this work, we propose BadEncoder, the first backdoor attack to self-supervised learning. In particular, our BadEncoder injects backdoors into a pre-trained image encoder such that the downstream classifiers built based on the backdoored image encoder for different downstream tasks simultaneously inherit the backdoor behavior. We formulate our BadEncoder as an optimization problem and we propose a gradient descent based method to solve it, which produces a backdoored image encoder from a clean one. Our extensive empirical evaluation results on multiple datasets show that our BadEncoder achieves high attack success rates while preserving the accuracy of the downstream classifiers. We also show the effectiveness of BadEncoder using two publicly available, real-world image encoders, i.e., Google's image encoder pre-trained on ImageNet and OpenAI's Contrastive Language-Image Pre-training (CLIP) image encoder pre-trained on 400 million (image, text) pairs collected from the Internet. Moreover, we consider defenses including Neural Cleanse and MNTD (empirical defenses) as well as PatchGuard (a provable defense). Our results show that these defenses are insufficient to defend against BadEncoder, highlighting the needs for new defenses against our BadEncoder. Our code is publicly available at: https://github.com/jjy1994/BadEncoder.

中文翻译：

BadEncoder：自监督学习中对预训练编码器的后门攻击

计算机视觉中的自监督学习旨在使用大量未标记的图像或（图像、文本）对预训练图像编码器。然后，预训练的图像编码器可以用作特征提取器，为许多具有少量或没有标记训练数据的下游任务构建下游分类器。在这项工作中，我们提出了 BadEncoder，这是对自监督学习的第一个后门攻击。特别是，我们的 BadEncoder 将后门注入预训练的图像编码器，以便基于后门图像编码器为不同下游任务构建的下游分类器同时继承后门行为。我们将 BadEncoder 公式化为一个优化问题，并提出了一种基于梯度下降的方法来解决它，该方法从一个干净的编码器生成一个后门图像编码器。我们对多个数据集的广泛实证评估结果表明，我们的 BadEncoder 实现了高攻击成功率，同时保持了下游分类器的准确性。我们还使用两个公开可用的真实世界图像编码器展示了 BadEncoder 的有效性，即在 ImageNet 上预训练的谷歌图像编码器和在 4 亿（图像, text) 对收集自互联网。此外，我们考虑了包括 Neural Cleanse 和 MNTD（经验防御）以及 PatchGuard（可证明的防御）在内的防御。我们的结果表明，这些防御措施不足以抵御 BadEncoder，突出了对我们的 BadEncoder 的新防御的需求。我们的代码可在以下网址公开获取：https://github.com/jjy1994/BadEncoder。

更新日期：2021-08-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>