Abstract

In recent years, machine learning approaches have been widely adopted for many applications, including classification. Machine learning models deal with collective sensitive data usually trained in a remote public cloud server, for instance, machine learning as a service (MLaaS) system. In this scene, users upload their local data and utilize the computation capability to train models, or users directly access models trained by MLaaS. Unfortunately, recent works reveal that the curious server (that trains the model with users’ sensitive local data and is curious to know the information about individuals) and the malicious MLaaS user (who abused to query from the MLaaS system) will cause privacy risks. The adversarial method as one of typical mitigation has been studied by several recent works. However, most of them focus on the privacy-preserving against the malicious user; in other words, they commonly consider the data owner and the model provider as one role. Under this assumption, the privacy leakage risks from the curious server are neglected. Differential privacy methods can defend against privacy threats from both the curious sever and the malicious MLaaS user by directly adding noise to the training data. Nonetheless, the differential privacy method will decrease the classification accuracy of the target model heavily. In this work, we propose a generic privacy-preserving framework based on the adversarial method to defend both the curious server and the malicious MLaaS user. The framework can adapt with several adversarial algorithms to generate adversarial examples directly with data owners’ original data. By doing so, sensitive information about the original data is hidden. Then, we explore the constraint conditions of this framework which help us to find the balance between privacy protection and the model utility. The experiments’ results show that our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive data from direct content exposed attacks. In addition, our method can achieve better privacy and utility balance compared to the existing method.

1. Introduction

In recent years, machine learning technology has rapidly gained popularity, as its model can be improved automatically through learning from the training dataset. Benefit from the development enables more efficient storage, processing, and computation, and more and more machine learning models are widely applied in the real world for classification or regression tasks for many domains, such as image classification [1], speech recognition [2], healthcare data management [3], and financial analysis [4]. Samples are used to train the machine learning model to represent many individuals’ sensitive information, such as patients’ healthcare information, personal preference, and personal photos. For instance, image classification, especially facial recognition technologies, is applied in many scenes including activity monitoring and identification. Healthcare record management requires clinical features and medical data from patients to learn a model for diagnosis and prognosis. In finance, an individual’s trade record, current price records, and so on are used to study the financial prediction machine learning and financial risk analysis systems.

Machine learning models deal with collective sensitive data usually trained in a remote public cloud server, for instance, machine learning as a service (MLaaS) system. In this scene, users upload their local data and utilize the computation capability to train models, or users directly access models trained by MLaaS. Unfortunately, recent works reveal that the curious server (that trains the model with users’ sensitive local data and is curious to know the information about individuals) and the malicious MLaaS user (who abused to query from the MLaaS system) will cause privacy risks. For example, Shrokri et al. [5] came up with a membership inference attack (MIA) against machine learning models. The MIA adversary uses machine learning to train an attack model to speculate if a given data record was a member of the target model’s training dataset. These privacy risks not only can directly violate the privacy of the training set but also can be a gateway to further attacks [6]. That is, some adversaries use the inference information to construct other security threats. For example, if the individual’s biometric features were inferred out, the attacker can utilize this information to make a personating attack or obtain illegal authorities. These further attacks might pose additional severe information leakage or other heavy security risks [5].

The adversarial method is one of the typical mitigations to address the machine learning privacy risks. Nasr et al. [7] formalized the privacy risk preserving as a min-max game optimization problem that minimizes the classification error of the target model and minimizes the inference adversary’s maximum gain. They use the stochastic gradient descent algorithm to optimize the min-max problem. Jia et al. [8] proposed a MemGuard to defend against black-box MIAs by adding a carefully crafted noise vector to the target model’s output confidence score vector. Wang et al. [9] jointly formulate model compression and MIA as MCMIA, utilizing model compression against MIAs in deep neural networks. However, most of them focus on the privacy-preserving against the malicious user; in other words, they commonly considered the data owner and the model provider as one role and the adversarial perturbation objective always focuses on the target model or the confidence score of the target model. Under this assumption, the privacy leakage risks from the curious server are neglected. It is not suitable for the data user to upload their local data to the MLaaS system for training a machine learning model remotely. That is, the privacy leakage risks from the model provider are commonly neglected, and the owner’s data privacy cannot be preserved entirely. Differential privacy methods can defend against privacy threats from both the curious server and the malicious MLaaS user by directly adding noise to the training data. Nonetheless, the differential privacy method will decrease the classification accuracy of the target model heavily.

In this paper, we investigate existing adversarial perturbation-based defenses to categorize them. Then, we propose a generic privacy-preserving framework based on an adversarial method to defend both the curious server and the malicious MLaaS user. The framework can adapt with several adversarial algorithms to generate adversarial examples directly with data owners’ original data. By doing so, sensitive information about the original data is hidden. In addition, we explore the constraint conditions of this framework which help us to find the balance between privacy protection and the model utility. The framework consists of one adversarial perturbation generator, shadow models, and privacy evaluator. The generator learns to perturb the original data of the local user. We leverage the adversarial method to diminish the effect of the membership inferences by generating adversarial examples of the original training data. By doing so, sensitive information about the training data is hidden. The privacy evaluator then detects whether there exist privacy risks. The feedback scheme and the constraint are used to find out the balance between privacy-preserving and the performance of the target model. The main contributions of this paper are summarized as follows:(i)Taxonomy of existing adversarial method-based privacy risk mitigation. We propose a pioneering study to categorize adversarial method-based defenses by different perturbation objects; and we analyze the limitation of this existing mitigation. These works help us to build general recognition in this field.(ii)The generic defense framework for adversarial method against privacy risk. We utilize the feedback mechanism and some constraint conditions to develop a common adversarial defense framework for machine learning privacy-preserving. We also explore the constraint conditions that make our method more effective. This framework can protect the original training data’s privacy under data sharing and MLaaS scenarios; and we implement three different kinds of adversarial example algorithms based on privacy-preserving mitigation with our generic framework. We exploit the MIA as the classical privacy leakage evaluator to verify the effectiveness of these three defenses.(iii)Comprehensive analysis of the influence factors for the proposed method. We investigate defense factors, including adversarial algorithms, perturbation rates, adversarial distance, and data type, showing that, under appropriate factors strategies, our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive original data from direct content exposed attacks. In addition, our method can achieve more privacy and utility compared to the existing method, for instance, the min-max method and the differential privacy. Furthermore, the experiment results show that we can control the defense performance with parameters such as distortion rate and distance threshold.

In this section, we present background and related work on privacy threats in data sharing and MLaaS scenes and briefly review the membership inference attacks and machine learning privacy-preserving proposed in previous works.

2.1. Privacy Threats in Data Sharing and MLaaS Scenes

Fatemehsadat et al. [10] divided existing threats to privacy in machine learning systems into two main categories of direct and indirect information exposure hazards. They define in the direct threats that the adversary can gain direct access to sensitive datasets. Private data can be exposed through data sharing scenes or the cloud service that receives it to run a process on it [10]. For example, in the MLaaS platform, training data must be revealed to the service. If the MLaaS service operators are malicious, they can directly access the sensitive training data. However, users may not want to reveal their private information. The indirect information exposure means the attacker attempts to infer or guess the information and does not have access to the actual information [10]. A number of works have focused on the indirect information exposure, such as model extraction attacks [11], model inversion [12], and membership inference attacks [5]. Tramer et al. [11] explore model extraction attacks that exploit the tension between query access and confidentiality in machine learning models and aim to copy functions from victim models in the MLaaS setting. Fredrikson et al. [13] devise the model inversion attack in which the adversary access to a machine learning model abused to learn sensitive genomic information about individuals. Fredrikson made a further exploration to develop a new class of model inversion attacks based on prediction results from the MLaaS APIs. Shokri et al. [5] proposed the membership inference attack performed entirely through the MLaaS services. The model extraction attacks refer to the confidentiality of the victim model. The model inversion and the membership inference attack threaten the privacy of training data. The privacy risks in data sharing and MLaaS are not only from adversaries but may also come from the curious server and other malicious users.

2.2. Membership Inference Attacks (MIAs)

In an MIA, the adversary queries the victim model to determine whether a particular sample is included in the training set of the victim model. Previous research demonstrates that overfitting is one of the reasons to cause the MIAs [14]. Shokri et al. [5] introduced the first MIA against machine learning system, and they turned MIA into a binary classification problem. In Shokri et al.’s method, the process of MIA is as follows. In the first stage, the attacker prepares a set of candidate data records. Next, they query access to the victim model and gain a classification probability vector from the victim model. After that, the adversary trains shadow models to craft training data for the attack model, followed by training the attack model and using the attack model to determine whether the candidate samples are members of the target model’s training dataset. For each candidate record, there are two possible classes: class “member” means that the candidate data is a member of the target model’s training dataset, and the class “nonmember” means that the candidate data is not in the training dataset. Hayes et al. [6] present the first MIAs against generative adversarial networks (GANs). ML-leaks [15] relaxes some assumptions of Shokri et al.’s work [5], such as the number of shadow models, the knowledge of the target model structure, and the target model’s dataset information.

Despite the fact that the previous research has focused on MIA as a method of attack, recent works [7, 8, 16, 17] have used MIA as a privacy leak assessment tool. Jayaraman et al. [16] used membership inference (MI) to quantify the impact of differential privacy with the privacy loss of different ML models. The MI-based evaluator helps to find the range of differential privacy parameters to achieve a balance between utility and privacy and also to evaluate the privacy leakage of training data at risk of exposure. Song et al. [17] exploited MI to evaluate the defense approach of adversarial regularization [7] and MemGuard [8] with a new metric called the privacy risk score. Jia et al. [8] and Nasr et al. [7] used neural network classifiers to evaluate the mitigation performance. Thus, we choose MI as the classical privacy leakage evaluator to verify the effectiveness of our defense framework.

2.3. Machine Learning Privacy-Preserving

Existing ML privacy-preserving can be classified into three categories, that is, generalization method [5, 15], differential privacy [16, 18, 19], and adversarial methods [79].

2.3.1. Generalization Method

Previous works show that the overfitting model can memorize the information about the training data; furthermore, it can cause the privacy leakage risks, such as MIA. Thus, generalization is one of the most popular mitigations against ML privacy risks. Salimans et al. [20] presented the weight normalization, a simple reparameterization of the weight vectors in a neural network, by speeding up the convergence of stochastic gradient descent. Srivastava et al. [21] used dropout to fit the overfitting problem in deep neural network with a large number of parameters. Shokri et al. [22] and Salem et al. [15] found that dropout only was effective to degrade overfitting and strengthen privacy-preserving in neural networks. Salem et al. [15] exploited model stack as the generalization method which is suitable for all ML models. Shokri et al. [5] utilized standard regularization to mitigate privacy risk caused by overfitting. However, Long et al. [14] discovered that overfitting is a sufficient but not necessary condition for MIA to succeed. They found that existing generalization techniques are less effective in MIA protecting.

2.3.2. Differential Privacy

Differential privacy has been regarded as a strong privacy standard [2327]. Differential privacy is one of the classical defenses against privacy risks. According to the ML processes, the differential privacy mechanism can be applied as the input perturbation, the gradient perturbation, the objective perturbation, and the output perturbation [28]. Reference [29] presented a differentially private GANs model which includes a Gaussian noise layer in the discriminator in case of a generative adversarial network to make the output and the gradients differential privacy with respect to the training data. Papernot et al. [30] used Private Aggregation of Teacher Ensembles (PATE) to construct an output perturbation. Reference [31] used the differentially private stochastic gradient descent algorithm (DP-SGD) to prevent memorization. Although differential privacy has a significant effect on privacy protection, the introduction of a differential privacy mechanisms will greatly reduce the classification performance of the target model. This disadvantage hampers the application of differential privacy in real-world scenarios [16].

2.3.3. Adversarial Methods

Previous works showed that the adversarial method-related defenses contain two branches of research, that is, adversarial training and adversarial examples. Nasr et al. [7] put forward a min-max game which designs an adversarial training algorithm that minimizes the prediction loss of the model as well as the maximum gain of the inference attacks. This strategy can guarantee that membership privacy acts also helped to generalize the target model. Jia et al. [8] proposed a method based on adversarial examples, which adds noise to each confidence score vector by victim classifier. Its aim is to mislead the classification ability of attack models through carefully crafted adversarial samples. MemGuard [8] can effectively defend against MIAs and achieve better privacy-utility tradeoffs compared to previous works. However, Nasr et al. and Jia et al. considered the data owner and the model provider as one role. Therefore, these mitigations are not suitable for data sharing or MLaaS scenes in which the data owner should reveal their sensitive data to the services. In addition, there are many methods and algorithms for adversarial training and adversarial examples [3339]. Therefore, it is necessary to construct a general defense framework to adapt the different sample adversarial algorithms, to find a balance between the defense of privacy and the performance of the target model, and to cover a wider range of privacy risk scenarios.

3. Taxonomy of Adversarial Method-Based Defense

In general, adversarial method defense can be categorized into three types by different perturbation objects: the input training data, the model, and the output prediction vector. We categorize the existing and the proposed adversarial method-based defenses in Table 1. We describe all the possible types in the following paragraphs.

3.1. Output Perturbation-Based Defenses

This kind of defense aims at adding crafted noise to the confidence score vector or labels to synthesize adversarial examples to mislead the attack classifiers. For example, MemGuard [8] adds noise vector to the confidence score with a certain probability to defend against black-box membership inference attacks. Yang et al. [32] proposed a framework to purify the confidence score vectors by reducing their dispersion. Yang’s purification framework can defend against the model inversion attack and the membership inference attack. Both Jia et al. [8] and Yang et al. [32] considered the data owner and the model provider as one role. This assumption is not suitable for the scenario where users upload their local data to train a model remotely. This defense is the effective mitigation for privacy threats that the adversary infers the reconstruction and the membership of a training data from the probability predicted by the victim classifier.

3.2. Model Perturbation-Based Defenses

In this defense type, defenders exploit model compression or model adversarial training to reduce the privacy risks. For instance, Wang et al. [9] jointly formulated model compression and MIA as MCMIA to reduce the information leakage from MIA. Min-max method [7] designed an adversarial training algorithm that minimizes the classification loss of victim model and maximum gain of attack model. These defenses can mitigate the privacy risk caused by the model overfitting, whereas these defenses are hard to be deployed to a public MLaaS platform.

3.3. Input Perturbation-Based Defenses

A number of proposed adversarial method-based defenses almost focus on the output perturbation and the model regularization to reduce the privacy leakage. As Figure 1 shows, both output perturbation and model perturbation schemes cannot defend against the privacy risks from the curious model provider. Input perturbation-based defenses which add noise directly to the training data can solve this problem. However, it is a challenge to find the balance between privacy-preserving and the utility of the classification model.

4. Generic Framework for Adversarial Method-Based Defenses

In this section, we begin by describing the privacy leakage in remote model training scenario (MLaaS) to formulate the problem of defending against these threats. Then, the generic defense framework is designed for adversarial method-based defense. In addition, we analyze the constraint conditions of the framework. Finally, we introduce three adversarial algorithms to implement the framework-based mitigation.

4.1. Problem Formulation

In a typical MLaaS application, there are four parties: user, model provider, attacker, and defender. We discuss each party as follows.

4.1.1. User

There are two kinds of users in the MLaaS scenario. The first user has some sensitive training data, such as facial images, healthcare records, financial information, and individual performance. As the user does not have enough computation resources, they want to input these local data to train a machine learning classifier by a remote MLaaS platform. Then the user can gain a trained classifier for their machine learning applications. The second user can directly exploit the classifier deployed in the MLaaS platform to query for some examples and obtains the classification result from the remote services. The local datasets are uploaded to the remote service platform.

4.1.2. Model Provider

The model provider commonly is the public MLaaS platform. The model provider has two abilities: First of all, it can supply the initial model and the computation resource to users for training a model suitable for their characters. The second capability is that the provider can offer a trained model to users for querying and return the classification result (the confidence score vector of classification) for users. We call the machine model as target model (victim model).

4.1.3. Attacker

An attacker aims to obtain the information about user’s sensitive data. The adversary probably is malicious users; they aim to infer the information about the training data. For instance, the adversary utilizes the black-box MIA [5, 6] to infer the members of the training dataset by querying from the target model. These attackers construct a binary classifier that can speculate out whether a query data record is one of the target model’s training datasets or not by its confidence vector of the target model. Besides, the attacker may be a curious server of MLaaS that wants to know the details about user’s uploading data. During the training and querying process, the user’s data record is directly exposed to the server. So, attackers can easily achieve their goals.

4.1.4. Defender

The defender could be the user (data owner) or a trusted third party that has access to the user’s uploading data . For any training tasks or query tasks from the user to the model provider, the defender directly perturbs the uploading data to hide sensitive information to prevent the adversaries from obtaining exposed data or launching inference attacks. The defender aims to achieve two goals: The first goal is that the defender protects user’s sensitive data from directly exposed risks or data inference attacks, such as MIAs. The second goal is that the defender tries to find the balance between privacy-preserving and target model’s classification utility.

4.2. Framework Architecture

To defend against privacy threats in remote ML service scenarios, we propose a defense framework as depicted in Figure 2. Our defense framework consists of three components: (1) an adversarial perturbation generator that crafts adversarial examples to hide the sensitive information of uploading data, (2) a simulator to mimic the classification probability of the target model, and (3) a privacy leakage evaluator that detects the privacy leakage extent. The evaluator gives feedback to the generator to optimize the performance and the balance between defending against privacy threats and the target model’s performance.

4.2.1. Adversarial Perturbation Generator

Although the adversarial examples are treated as harmful in many poisoning scenarios, they can be used to achieve our privacy protection goal. The fundamental idea is to generate an adversarial dataset, in which the sensitive privacy of original data is masked. As shown in Figure 2, the adversarial perturbation generator’s goal is to find a set of adversarial examples which can conceal the sensitive information of the original uploading dataset . In consideration of the target model’s classification performance, we do the adversarial distortion as slightly as possible. That is, the Euclidean distance between each adversarial record and the original data should be small. We introduce the distance threshold , and we aim to achieve to help the generator to find out the appropriate adversarial examples which will be uploaded to train the simulator. There are many adversarial algorithms, for instance, the AdvGAN [40, 41], the Fast Gradient Sign Method (FGSM) [34], and the OPTMARGIN [42]. The adversarial algorithm is regarded as the basic support of the adversarial perturbation generator. The details of adversarial algorithms exploited in this paper are introduced in Section 5.1. We bring a parameter named as the perturbation rate, which can control the ratio of adversarial examples in uploading training data.

4.2.2. Simulator

The simulator is constituted by one or several ML models that mimic the probability distribution of the target model. Shokri et al. [5] and Salem et al. [15] explored how to construct shadow models to mimic the target model effectively. We use the same method to construct the simulator. In our framework, we aim to generate adversarial perturbed data which has two attributes: (1) the perturbed data has the same classification label as the original record’s label. (2) The generated data can be used to train and minimize the quadratic loss function of the simulator.

4.2.3. Privacy Leakage Evaluator

We exploit the privacy leakage evaluator to detect the privacy risk level in order to evaluate the privacy-preserving effectiveness of the adversarial example generated module. The privacy leakage evaluator can adapt to multiple evaluation plugins. In this paper, we utilize the membership inference method as the typical evaluation plugin to verify the performance of our defense framework. The evaluator is an MI classifier that speculates whether an example is one of the simulator’s members. In addition, this module sends the evaluation result to adversarial perturbation generator for helping the generated module to find out the proper adversarial examples. The assessment function considers several constraint conditions, for example, the generalization gap, the decreasing level of MI accuracy, and the loss function of the MI classifier. The loss function of the MI classifier is formulated as . In this module, the constraint conditions will be sent to the adversarial perturbation generator module for noise production parameter adjustment direction. If the MI inference accuracy is too large, we need to adjust the direction of greater perturbation disturbance. On the contrary, we need to adjust in the direction of less perturbation disturbance.

4.2.4. Defense for MIA

The defense framework can be detailedly specialized in the specific scenario of mitigating the MIA when the user uploads their local data to the remote machine learning service. In this paper, we explore the MIA proposed by Shokri et al. [5] in which the user supplies training data as inputs to the MLaaS service (model provider) to construct a model from the uploading data. Then, the user can use the model in other applications. In this scene, the adversary can access querying the model and obtain the classification result.

Algorithm. 1 shows the work method in this case. The adversarial perturbation generator is trained to synthesize a set of training data, which can mask their data information to the MI adversary (Goal I). At the same time, the performance of the simulator trained by these syntheses data does not degrade heavily (Goal II). The simulator and the evaluator are the assistant tools to help the generator to produce adversarial examples which satisfy the needs of Goal I and Goal II.
We formally present this MIA defense problem by the following optimization problem.
Let be the user’s original data. is the adversarial example based on . The Euclidean distance is used to measure the adversarial distortion between the adversarial example and the original data. The distance threshold is used as the upper bound of the adversarial distortion as inequation (1). We chose the adversarial example, whose adversarial distortion is smaller than the threshold , to construct the uploading training dataset. We introduce the perturbation rate to control the degree of adversarial perturbation. denotes the size of the uploading dataset. The size of adversarial examples which the generator needs to synthesize is . When , it means that all the upload training data is the generated adversarial examples. The value of can be set by experiences.We aim to maintain the performance of the simulator. Thus, we exploit the method to maintain the same classification label between the original data and the perturbed data proposed by Jia et al. [8]. The formulation can be described as in equation (2). Let be the classification function of the simulator. Each output is a confidence vector. The classification label of original data and adversarial data can be represented as and , respectively.Let be perturbed uploading dataset. is the label set of original examples . Let indicate the th training examples. The cross-entropy loss function of simulator is given in equation (3). We aim to minimize the loss function of the simulator, as equation (4) shows, to guarantee that the mitigation causes the minimum influence to the simulator.Let be the MI evaluator’s model. The optimization problem for the MIA is maximizing the loss function of the evaluator, as in the following equation:

Input: user’s original data X for remote training; the test dataset for evaluation; α is the perturbation rate;
is the dataset constructed by the adversarial examples which synthesized by the generator;
N is the size of X;
𝜖 is the Euclidean distance threshold between the adversarial example and the original data. m is the maximize number of adversarial perturbation rounds.
Output: h,
init the adversarial model h, the simulator’s model f, the evaluation model ;
;
random choose ;
;
for(i = 0; i < m; i++){//to generate the adversarial examples
 Train(h);
 for(j = 0; j < S; j++){
  ;
  for(k = 0; k < size(); k++) {
   ;
   //Euclidean distance
   if(dk ≤ 𝜖){
    
    if(lk < Lk){
     Lk = lk
     
    }
   }
  }
  
  if(cross entropy(, ) > cross entropy
  
 }
Return h,

5. Implementation and Evaluation

In this section, we present details of implementation, including the adversarial algorithms, models, and datasets. Then, we give out experiment results and analysis of our defense framework.

5.1. Adversarial Algorithms
5.1.1. Generative Adversarial Networks- (GANs-) Based Method

GANs-based method generates adversarial examples with generative adversarial networks (GANs) [41]. As the GANs can learn and approximate the distribution of original records, the GANs-based method can generate perturbation efficiently. is the generator of AdvGAN. denotes the target model. Feed the original instance into generator , and produce perturbation . Then, input to discriminator . aims to encourage generator to synthesize instance which is indistinguishable from the original data’s classification result.

5.1.2. Fast Gradient Sign Method (FGSM)

FGSM was first proposed by Goodfellow et al. [34]. denotes the model parameters. is the original data. is the real label of the original data . is the loss function used to train the model with respect to the inputs. is the distortion parameter that controls the perturbation level.

The adversarial example can be expressed as

FGSM is a fast and reliable method to generate adversarial examples which cause more noticeable perturbation compared to other adversarial algorithms. This attribute can be used to protect the original data from the direct content exposed attack.

5.1.3. OPTMARGIN-Based Defense

He et al. [42] proposed the OPTMARGIN adversarial method, which can generate low-distortion adversarial examples that are robust to small perturbations. The OPTMARGIN method creates a surrogate model of the region classifier, which classifies a smaller number of perturbations. is the point classifier; is perturbation applied to the original data . ; they define a loss term for each model:

The minimization problem of one is

5.2. Datasets
MNIST (http://yann.lecun.com/exdb/mnist/): we use MNIST [43] for our experiments, which consists of 60000 training images and 10000 test images, all drawn from the same distribution. All these black and white digits are size-normalized and centered in a fixed-size image, where the center of gravity of the intensity lies at the center of the image with pixels.CIFAR10 (https://www.cs.toronto.edu/kriz/cifar.html): the CIFAR10 dataset consists of 60000 32 × 32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The dataset is divided into five training batches and one test batch, each with 10000 images.Data separation: in our experiments, we randomly select 5000 records as the original instances used to generate adversarial examples and 5000 records as test data for training the simulator. We randomly select the other 5000 for the inference evaluator. The evaluated data is expressed as .
5.3. Performance of Our Defense Framework

Table 2 shows the performance of our defense framework against MIA. We analyze the training accuracy and test accuracy of the target model and the attack accuracy of IM model under two kinds of public datasets (CIFAR10 and MNIST). In this experiment, we exploit the AdvGAN algorithm. We find the following: (1) our defense framework can restrain the inference accuracy. Under the CIFAR10, our mitigation reduces the attack accuracy from 93.71% to 51.4%, while with the MNIST, our defense decreases the attack accuracy from 89% to 52.5%. As these attacks nearly randomly guess probability (50%), the mitigation was proven to be effective. (2) Our defense can improve the test accuracy of the target classifier. It is interesting to see that even though the training accuracy is declined after applying our defense, the test accuracy of the target is promoted. Specifically speaking, the test accuracy with CIFAR10 increases from 62.24% to 69.7%, while the test accuracy with MNIST increases from 57.9% to 91.12%.

5.4. Comparison with Existing Methods

We illustrate the comparison of our method with existing methods including min-max [7] and differential privacy [16]. We use CIFAR 10 as the dataset; and we use the AdvGAN algorithm and set  = 50 in differential privacy mitigation; the distance of our method is 23. Table 3 demonstrates that all the three defenses can restrain the MIA accuracy close to 50%, after 50 epochs. Among these three mitigations, the inference accuracy under differential privacy is the lowest. However, the training accuracy and the test accuracy are just at 1.2% and 1%, respectively. Under the min-max method, the inference accuracy, the training accuracy, and the test accuracy are 52.9%, 68.6%, and 62.7%, respectively. Under our method, the training accuracy and the test accuracy are 51.94%, 78.3%, and 69.7%. Our method achieves MI privacy with the minimum utility cost of the target classifier.

5.5. Different Adversarial Algorithm-Based Privacy-Preserving Capability Evaluation
5.5.1. Protect Content Disclosed Attack

We visualize MNIST outputs with different adversarial algorithms with our defense framework. As demonstrated in Figure 3, the AdvGAN-based method produces the clearest records compared to the FGSM and OPTMARGIN-based ones. Examples synthesized by the FGSM-based method are more noticeable, causing the content of these outputs to be unrecognizable. This attribute can be used to protect the sensitive information from the direct content disclosed risks coming from the curious model provider platform.

5.5.2. Privacy-Preserving

Figure 4 shows the MIA accuracy of various adversarial construction methods: AdvGAN, OPTMARGIN, and FGSM. In Figure 4(a), the adversarial instances are generated without constraint conditions introduced in 4.2. We run this experiment with MNIST and set the distance threshold as 100. We let the distortion rate be 100%; in other words, we train the target model with 100% adversarial instances. In Figure 4(b), the adversarial examples are synthesized by our defense framework (with constraint conditions introduced in 4.2). The result indicates that our constrained conditions can optimize the defense performance of the AdvGAN- and FGSM-based mitigation. However, the defense effectiveness of OPTMARGIN-based defense is not satisfying. The reason might be that the parameter of our defense framework should be adjusted with a different algorithm. The relationship between parameters and the defense performance will be further discussed in 5.6.

5.5.3. Evaluation of Privacy and Utility

In Figure 5, we compare the training accuracy and test accuracy of the target classification model with different adversarial algorithms. We run this experiment with MNIST and set the distance threshold as 100. We let the distortion rate be 100%; in other words, we train the target model with 100% adversarial instances. As we can see, the OPTIMARGIN method has the lowest test accuracy, but its training accuracy is higher than that of FGSM. The AdvGAN algorithm has the highest training accuracy and test accuracy after 100 epochs; these values are even bigger than the value without defense. The FGSM algorithm has the coincident training accuracy and test accuracy, and, after 100 epochs, values are close to the without defense ones.

5.6. Analyze the Influence Factors

We analyze the influence factors of our generic defense framework, including Euclidean distance and perturbation rate. We exploit the AdvGAN as the adversarial algorithm, and the original training data is MNIST. We set the training size to 5000 and the shadow model number is 5.

5.6.1. Distance Threshold

Figure 6 shows that, under different Euclidean distance threshold, privacy risks vary. The perturbation rate = 100%. We output one MIA accuracy for every five epochs in the 100 epochs. As shown in Figure 6, when the distance = 50, after 45 epochs, the attack accuracy area converges and is between 52% and 54%. When the distance = 200, 150, and 100, attack accuracy increases gradually between the 1st and 10th epochs, decreases between the 10th and 20th epochs, and increases slowly thereafter. According to Figure 6, the greater the Euclidean distance, the higher the MIA accuracy, the greater the risk of a privacy breach. In Euclidean distance, the smaller the distance, the better the defense. In particular, a value of 50 is the best defense against Mia, accuracy close to the random selection result (50%).

5.6.2. Distortion Rate

The Euclidean distance = 100. Figure 7 shows that we output one MIA accuracy for every five epochs in the 100 epochs. As shown in Figure 7, MIA accuracy is lower when perturbation rate = 100% compared to when distortion rate = 80%, 60%, 40%, and 20%. That is, the least amount of private information is disclosed. MIA accuracy is the highest when the perturbation rate is 20%. It means that the most private information is leaked. As you can see from Figure 7, the greater the perturbation rate is, the less privacy is exposed. The smaller the distortion rate is, the more privacy is exposed.

6. Conclusion and Future Work

We have designed and implemented a defense framework for privacy-preserving in remote machine learning services. We aim at devising a framework that not only can mitigate the privacy risk, that is, MIAs, but also can protect the training data from direct content exposed attacks, with the different adversarial algorithm. The evaluation result shows that our defense framework with the AdvGAN method is effective against MIA and our defense framework with the FGSM method can protect the sensitive original data from direct content exposed attacks. In addition, our method can achieve more privacy and utility compared to the existing method, for instance, the min-max method and the differential privacy. Furthermore, the experiment results show that we can control the defense performance with parameters such as distortion rate and distance threshold.

In this paper, we just evaluate the performance of our defense framework against MIA. Thus, exploring our defense framework against other privacy threats, such as model inversion, is left as an open question. We should do more comprehensive investigating and refining our countermeasures for future work.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.