当前位置: X-MOL 学术IEEE Trans. Inform. Forensics Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Categorical Inference Poisoning: Verifiable Defense Against Black-Box DNN Model Stealing Without Constraining Surrogate Data and Query Times
IEEE Transactions on Information Forensics and Security ( IF 6.8 ) Pub Date : 2023-02-10 , DOI: 10.1109/tifs.2023.3244107
Haitian Zhang 1 , Guang Hua 1 , Xinya Wang 1 , Hao Jiang 1 , Wen Yang 1
Affiliation  

Deep Neural Network (DNN) models have offered powerful solutions for a wide range of tasks, but the cost to develop such models is nontrivial, which calls for effective model protection. Although black-box distribution can mitigate some threats, model functionality can still be stolen via black-box surrogate attacks. Recent studies have shown that surrogate attacks can be launched in several ways, while the existing defense methods commonly assume attackers with insufficient in-distribution (ID) data and restricted attacking strategies. In this paper, we relax these constraints and assume a practical threat model in which the adversary not only has sufficient ID data and query times but also can adjust the surrogate training data labeled by the victim model. Then, we propose a two-step categorical inference poisoning (CIP) framework, featuring both poisoning for performance degradation (PPD) and poisoning for backdooring (PBD). In the first poisoning step, incoming queries are classified into ID and (out-of-distribution) OOD ones using an energy score (ES) based OOD detector, and the latter are further classified into high ES and low ES ones, which are subsequently passed to a strong and a weak PPD process, respectively. In the second poisoning step, difficult ID queries are detected by a proposed reliability score (RS) measurement and are passed to PBD. In doing so, the first step OOD poisoning leads to substantial performance degradation in surrogate models, the second step ID poisoning further embeds backdoors in them, while both can preserve model fidelity. Extensive experiments confirm that CIP can not only achieve promising performance against state-of-the-art black-box surrogate attacks like KnockoffNets and data-free model extraction (DFME) but also work well against stronger attacks with sufficient ID and deceptive data, better than the existing dynamic adversarial watermarking (DAWN) and deceptive perturbation defense methods. PyTorch code is available at https://github.com/Hatins/CIP_master.git .

中文翻译:

分类推理中毒:可验证防御黑盒 DNN 模型窃取而不约束代理数据和查询时间

深度神经网络 (DNN) 模型为广泛的任务提供了强大的解决方案,但开发此类模型的成本不菲,这需要有效的模型保护。尽管黑盒分发可以减轻一些威胁,但模型功能仍然可以通过黑盒代理攻击被窃取。最近的研究表明,代理攻击可以通过多种方式发起,而现有的防御方法通常假设攻击者的分布 (ID) 数据不足且攻击策略受限。在本文中,我们放宽了这些约束并假设了一个实际的威胁模型,在该模型中,对手不仅拥有足够的 ID 数据和查询次数,而且还可以调整受害模型标记的替代训练数据。然后,我们提出了一个两步分类推理中毒(CIP)框架,同时具有性能下降中毒(PPD)和后门中毒(PBD)。在第一个中毒步骤中,使用基于能量分数(ES)的 OOD 检测器将传入查询分类为 ID 和(分布外的)OOD,后者进一步分为高 ES 和低 ES,随后分别传递给强 PPD 过程和弱 PPD 过程。在第二个中毒步骤中,困难的 ID 查询由建议的可靠性评分 (RS) 测量检测并传递给 PBD。在这样做的过程中,第一步 OOD 投毒导致代理模型的性能大幅下降,第二步 ID 投毒进一步在其中嵌入后门,同时两者都可以保持模型保真度。大量实验证实,CIP 不仅可以在对抗 KnockoffNets 和无数据模型提取 (DFME) 等最先进的黑盒代理攻击方面取得令人鼓舞的性能,而且还可以很好地对抗具有足够 ID 和欺骗性数据的更强攻击,更好比现有的动态对抗水印(DAWN)和欺骗性扰动防御方法。PyTorch 代码可在https://github.com/Hatins/CIP_master.git .
更新日期:2023-02-10
down
wechat
bug