当前位置: X-MOL 学术IEEE Trans. Inform. Forensics Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
APMSA: Adversarial Perturbation Against Model Stealing Attacks
IEEE Transactions on Information Forensics and Security ( IF 6.8 ) Pub Date : 2023-02-20 , DOI: 10.1109/tifs.2023.3246766
Jiliang Zhang 1 , Shuang Peng 2 , Yansong Gao 3 , Zhi Zhang 4 , Qinghui Hong 5
Affiliation  

Training a Deep Learning (DL) model requires proprietary data and computing-intensive resources. To recoup their training costs, a model provider can monetize DL models through Machine Learning as a Service (MLaaS). Generally, the model is deployed at the cloud, while providing a publicly accessible Application Programming Interface (API) for paid queries to obtain benefits. However, model stealing attacks have posed security threats to this model monetizing scheme as they steal the model without paying for future extensive queries. Specifically, an adversary queries a targeted model to obtain input-output pairs and thus infer the model’s internal working mechanism by reverse-engineering a substitute model, which has deprived model owner’s business advantage and leaked the privacy of the model. In this work, we observe that the confidence vector or the top-1 confidence returned from the model under attack (MUA) varies in a relative large degree given different queried inputs. Therefore, rich internal information of the MUA is leaked to the attacker that facilities her reconstruction of a substitute model. We thus propose to leverage adversarial confidence perturbation to hide such varied confidence distribution given different queries, consequentially against model stealing attacks (dubbed as APMSA). In other words, the confidence vectors returned now is similar for queries from a specific category, considerably reducing information leakage of the MUA. To achieve this objective, through automated optimization, we constructively add delicate noise into per input query to make its confidence close to the decision boundary of the MUA. Generally, this process is achieved in a similar means of crafting adversarial examples but with a distinction that the hard label is preserved to be the same as the queried input. This retains the inference utility (i.e., without sacrificing the inference accuracy) for normal users but bounded the leaked confidence information to the attacker in a small constrained area (i.e., close to decision boundary). The later renders greatly deteriorated accuracy of the attacker’s substitute model. As the APMSA serves as a plug-in front-end and requires no change to the MUA, it is thus generic and easy to deploy. The high efficacy of APMSA is validated through experiments on datasets of CIFAR10 and GTSRB. Given a MUA model of ResNet-18 on the CIFAR10, our defense can degrade the accuracy of the stolen model by up to 15% (rendering the stolen model useless to a large extent) with 0% accuracy drop for normal user’s hard-label inference request.

中文翻译:

APMSA:针对模型窃取攻击的对抗性扰动

训练深度学习 (DL) 模型需要专有数据和计算密集型资源。为了收回培训成本,模型提供商可以通过机器学习即服务 (MLaaS) 将 DL 模型货币化。通常,该模型部署在云端,同时提供可公开访问的应用程序编程接口 (API),供付费查询获取收益。然而,模型窃取攻击对这种模型货币化方案构成了安全威胁,因为他们窃取了模型而无需为未来的广泛查询付费。具体来说,对手查询目标模型以获得输入输出对,从而通过逆向工程替代模型来推断模型的内部工作机制,这剥夺了模型所有者的业务优势并泄露了模型的隐私。在这项工作中,我们观察到,在给定不同查询输入的情况下,从受到攻击的模型 (MUA) 返回的置信度向量或 top-1 置信度在相对较大的程度上变化。因此,MUA 丰富的内部信息被泄露给攻击者,帮助她重建替代模型。因此,我们建议利用对抗性置信度扰动来隐藏给定不同查询的这种不同的置信度分布,从而抵御模型窃取攻击(称为 APMSA)。换句话说,现在返回的置信向量对于来自特定类别的查询是相似的,大大减少了 MUA 的信息泄漏。为了实现这一目标,通过自动优化,我们建设性地在每个输入查询中添加微妙的噪声,使其置信度接近 MUA 的决策边界。一般来说,这个过程是通过类似于制作对抗性示例的方式实现的,但不同之处在于硬标签被保留为与查询输入相同。这保留了普通用户的推理效用(即,不牺牲推理准确性),但将泄露的置信度信息限制在一个小的受限区域(即,接近决策边界)中的攻击者。后来的渲染大大降低了攻击者替代模型的准确性。由于 APMSA 用作插件前端并且不需要对 MUA 进行任何更改,因此它是通用的并且易于部署。通过对 CIFAR10 和 GTSRB 数据集的实验验证了 AP​​MSA 的高效性。给定 CIFAR10 上 ResNet-18 的 MUA 模型,
更新日期:2023-02-20
down
wechat
bug