Label-Only Model Inversion Attacks: Attack With the Least Information,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Label-Only Model Inversion Attacks: Attack With the Least Information
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 12-29-2022 , DOI: 10.1109/tifs.2022.3233190
Tianqing Zhu ₁ , Dayong Ye ₂ , Shuai Zhou ₂ , Bo Liu ₂ , Wanlei Zhou ₃

Affiliation

In a model inversion attack, an adversary attempts to reconstruct the training data records of a target model using only the model’s output. In launching a contemporary model inversion attack, the strategies discussed are generally based on either predicted confidence score vectors, i.e., black-box attacks, or the parameters of a target model, i.e., white-box attacks. However, in the real world, model owners usually only give out the predicted labels; the confidence score vectors and model parameters are hidden as a defense mechanism to prevent such attacks. Unfortunately, we have found a model inversion method that can reconstruct representative samples of the target model’s training data based only on the output labels. We believe this attack requires the least information to succeed and, therefore, has the best applicability. The key idea is to exploit the error rate of the target model to compute the median distance from a set of data records to the decision boundary of the target model. The distance is then used to generate confidence score vectors which are adopted to train an attack model to reconstruct the representative samples. The experimental results show that highly recognizable representative samples can be reconstructed with far less information than existing methods.

中文翻译：

仅标签模型反转攻击：使用最少信息的攻击

在模型反转攻击中，攻击者尝试仅使用目标模型的输出来重建目标模型的训练数据记录。在发起当代模型反转攻击时，所讨论的策略通常基于预测的置信度得分向量，即黑盒攻击，或目标模型的参数，即白盒攻击。然而，在现实世界中，模型所有者通常只给出预测的标签；置信度得分向量和模型参数被隐藏作为防止此类攻击的防御机制。不幸的是，我们发现了一种模型反演方法，可以仅根据输出标签重建目标模型训练数据的代表性样本。我们相信这种攻击需要最少的信息才能成功，因此具有最好的适用性。关键思想是利用目标模型的错误率来计算从一组数据记录到目标模型决策边界的中值距离。然后使用该距离生成置信度得分向量，该置信度得分向量用于训练攻击模型以重建代表性样本。实验结果表明，可以用比现有方法少得多的信息来重建具有高度识别性的代表性样本。

更新日期：2024-08-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11