Categorical Inference Poisoning: Verifiable Defense Against Black-Box DNN Model Stealing Without Constraining Surrogate Data and Query Times,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Categorical Inference Poisoning: Verifiable Defense Against Black-Box DNN Model Stealing Without Constraining Surrogate Data and Query Times
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 2-10-2023 , DOI: 10.1109/tifs.2023.3244107
Haitian Zhang ₁ , Guang Hua ₁ , Xinya Wang ₁ , Hao Jiang ₁ , Wen Yang ₁

Affiliation

Deep Neural Network (DNN) models have offered powerful solutions for a wide range of tasks, but the cost to develop such models is nontrivial, which calls for effective model protection. Although black-box distribution can mitigate some threats, model functionality can still be stolen via black-box surrogate attacks. Recent studies have shown that surrogate attacks can be launched in several ways, while the existing defense methods commonly assume attackers with insufficient in-distribution (ID) data and restricted attacking strategies. In this paper, we relax these constraints and assume a practical threat model in which the adversary not only has sufficient ID data and query times but also can adjust the surrogate training data labeled by the victim model. Then, we propose a two-step categorical inference poisoning (CIP) framework, featuring both poisoning for performance degradation (PPD) and poisoning for backdooring (PBD). In the first poisoning step, incoming queries are classified into ID and (out-of-distribution) OOD ones using an energy score (ES) based OOD detector, and the latter are further classified into high ES and low ES ones, which are subsequently passed to a strong and a weak PPD process, respectively. In the second poisoning step, difficult ID queries are detected by a proposed reliability score (RS) measurement and are passed to PBD. In doing so, the first step OOD poisoning leads to substantial performance degradation in surrogate models, the second step ID poisoning further embeds backdoors in them, while both can preserve model fidelity. Extensive experiments confirm that CIP can not only achieve promising performance against state-of-the-art black-box surrogate attacks like KnockoffNets and data-free model extraction (DFME) but also work well against stronger attacks with sufficient ID and deceptive data, better than the existing dynamic adversarial watermarking (DAWN) and deceptive perturbation defense methods. PyTorch code is available at https://github.com/Hatins/CIP_master.git.

中文翻译：

分类推理中毒：在不限制代理数据和查询时间的情况下针对黑盒 DNN 模型窃取的可验证防御

深度神经网络（DNN）模型为广泛的任务提供了强大的解决方案，但开发此类模型的成本不菲，这就需要有效的模型保护。尽管黑盒分发可以减轻一些威胁，但模型功能仍然可以通过黑盒代理攻击被窃取。最近的研究表明，代理攻击可以通过多种方式发起，而现有的防御方法通常假设攻击者的分布内（ID）数据不足且攻击策略有限。在本文中，我们放宽了这些约束，并假设了一个实际的威胁模型，其中对手不仅拥有足够的 ID 数据和查询时间，而且还可以调整受害者模型标记的代理训练数据。然后，我们提出了一个两步分类推理中毒（CIP）框架，具有性能下降中毒（PPD）和后门中毒（PBD）的特点。在第一个中毒步骤中，使用基于能量得分（ES）的 OOD 检测器将传入查询分为 ID 查询和（分布外）OOD 查询，后者进一步分为高 ES 和低 ES 查询，随后将其分类为高 ES 查询和低 ES 查询。分别传递到强 PPD 过程和弱 PPD 过程。在第二个中毒步骤中，通过建议的可靠性评分 (RS) 测量来检测困难的 ID 查询并将其传递给 PBD。这样做时，第一步 OOD 中毒会导致代理模型的性能大幅下降，第二步 ID 中毒会进一步在其中嵌入后门，而两者都可以保持模型保真度。大量实验证实，CIP 不仅可以针对 KnockoffNets 和无数据模型提取 (DFME) 等最先进的黑盒代理攻击实现良好的性能，而且还可以很好地抵御具有足够 ID 和欺骗性数据的更强攻击，效果更好比现有的动态对抗性水印（DAWN）和欺骗性扰动防御方法。 PyTorch 代码可在 https://github.com/Hatins/CIP_master.git 获取。

更新日期：2024-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11