Towards explainable model extraction attacks,International Journal of Intelligent Systems

当前位置： X-MOL 学术 › Int. J. Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards explainable model extraction attacks
International Journal of Intelligent Systems ( IF 7 ) Pub Date : 2022-09-08 , DOI: 10.1002/int.23022
Anli Yan _{1,

2} , Ruitao Hou ₂ , Xiaozhang Liu ₃ , Hongyang Yan ₂ , Teng Huang ₂ , Xianmin Wang ₂

Affiliation

One key factor able to boost the applications of artificial intelligence (AI) in security-sensitive domains is to leverage them responsibly, which is engaged in providing explanations for AI. To date, a plethora of explainable artificial intelligence (XAI) has been proposed to help users interpret model decisions. However, given its data-driven nature, the explanation itself is potentially susceptible to a high risk of exposing privacy. In this paper, we first show that the existing XAI is vulnerable to model extraction attacks and then present an XAI-aware dual-task model extraction attack (DTMEA). DTMEA can attack a target model with explanation services, that is, it can extract both the classification and explanation tasks of the target model. More specifically, the substitution model extracted by DTMEA is a multitask learning architecture, consisting of a sharing layer and two task-specific layers for classification and explanation. To reveal which explanation technologies are more vulnerable to expose privacy information, we conduct an empirical evaluation of four major explanation types in the benchmark data set. Experimental results show that the attack accuracy of DTMEA outperforms the predicted-only method with up to 1.25%, 1.53%, 9.25%, and 7.45% in MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100, respectively. By exposing the potential threats on explanation technologies, our research offers the insights to develop effective tools that are able to trade off security-sensitive relationships.

中文翻译：

迈向可解释的模型提取攻击

能够促进人工智能 (AI) 在安全敏感领域应用的一个关键因素是负责任地利用它们，即为人工智能提供解释。迄今为止，已经提出了大量可解释的人工智能 (XAI) 来帮助用户解释模型决策。然而，鉴于其数据驱动的性质，解释本身可能会面临暴露隐私的高风险。在本文中，我们首先证明了现有的 XAI 容易受到模型提取攻击，然后提出了一种 XAI 感知的双任务模型提取攻击（DTMEA）。DTMEA可以通过解释服务攻击目标模型，即既可以提取目标模型的分类任务，也可以提取目标模型的解释任务。更具体地说，DTMEA提取的替代模型是一种多任务学习架构，由一个共享层和两个用于分类和解释的特定任务层组成。为了揭示哪些解释技术更容易暴露隐私信息，我们对基准数据集中的四种主要解释类型进行了实证评估。实验结果表明，DTMEA 的攻击准确率在 MNIST、Fashion-MNIST、CIFAR-10 和 CIFAR-100 中分别优于仅预测方法，分别高达 1.25%、1.53%、9.25% 和 7.45%。通过揭示解释技术的潜在威胁，我们的研究为开发能够权衡安全敏感关系的有效工具提供了见解。我们对基准数据集中的四种主要解释类型进行了实证评估。实验结果表明，DTMEA 的攻击准确率在 MNIST、Fashion-MNIST、CIFAR-10 和 CIFAR-100 中分别优于仅预测方法，分别高达 1.25%、1.53%、9.25% 和 7.45%。通过揭示解释技术的潜在威胁，我们的研究为开发能够权衡安全敏感关系的有效工具提供了见解。我们对基准数据集中的四种主要解释类型进行了实证评估。实验结果表明，DTMEA 的攻击准确率在 MNIST、Fashion-MNIST、CIFAR-10 和 CIFAR-100 中分别优于仅预测方法，分别高达 1.25%、1.53%、9.25% 和 7.45%。通过揭示解释技术的潜在威胁，我们的研究为开发能够权衡安全敏感关系的有效工具提供了见解。

更新日期：2022-09-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>