K-Anonymity inspired adversarial attack and multiple one-class classification defense
Introduction
In image classification tasks (e.g., face/object recognition), the term adversarial examples refers to crafted images that appear to the human eye almost imperceptibly similar to the training samples, while being misclassified by the respective image classifier. The attempt of crafting such examples to this end is the so-called adversarial attack. Many classification models, including the ones based on Convolutional Neural Networks (CNN), have been found to be vulnerable to adversarial attacks (Goodfellow et al., 2015, Nguyen et al., 2015, Szegedy et al., 2013). Furthermore, recent studies (Moosavi-Dezfooli et al., 2017, Papernot, McDaniel, Goodfellow, 2016, Zhou et al., 2018) have shown that adversarial attacks have the property of transferability, i.e., carefully crafted adversarial examples may deceive various classification methods at the same time, ranging from similar deep architectures to even totally different classification methods, such as Support Vector Machines or Random Forests.
The research community has been actively developing adversarial attack methodologies over the past few years, as well as methodologies to anticipate these attacks. Different types of adversarial attacks are specified in the literature, depending on the level of information available to the adversary prior to the attack. In most cases, a white-box attack is assumed, i.e., the adversary has full knowledge about the model architecture to be deceived, including access to the values of the respective model weights. Therefore, the adversary is allowed to form queries to the model in order to backpropagate gradients for the given inputs by employing appropriate loss functions. In the black-box attack case, it is assumed that the adversary has limited or no information about the model architecture, other than its output classification labels. White-box attacks may be applied to attack model architectures unknown to the adversary, by employing intermediate/reference architectures (known to the adversary) and by exploiting the property of transferability (Papernot, McDaniel, Goodfellow, 2016). The design of adversarial defense methods to repulse these attacks seems to be a lot more challenging than initially anticipated (Carlini & Wagner, 2017a). Recent adversarial defense methods were based on obfuscating the model gradients (Papernot, McDaniel, Wu, Jha, Swami, 2016) for the given inputs, repulsing a number of known adversarial attack methodologies in white-box adversarial attack setups. However, it was later found that the defenses relying on obfuscating gradients are significantly less effective against newer and stronger attacks, or against transferability attacks generated by employing similar undefended architectures and can nowadays be easily overcome by the adversary (Athalye, Carlini, & Wagner, 2018).
In this paper, we consider that adversarial attacks should not only be viewed in a negative way as methods for fooling deep neural networks, as they have been used in other applications, notably in protecting private data automated analysis by recognition systems, that are typically used by service providers in social media (Batrinca & Treleaven, 2015). For example, adversarial attacks have been employed to disable known automatic face detection/recognition algorithms applied on visual data uploaded by social media users (Liu, Zhang, & Yu, 2017), without severely compromising image quality (Oh, Fritz, & Schiele, 2017), while at the same time, not hiding the person identities to human viewers. Moreover, adversarial attack methods could potentially be used to protect data captured from publicly installed cameras or even IoT sensors (e.g., UAV/surveillance/car cameras). However, to the best of our knowledge, unlike standard privacy protection methods (Sikiric, Hrkac, Zoran Kalafatic, et al., 2017), adversarial attack methodologies do not incorporate privacy protection-related constraints in their optimization process, therefore, even if they are successful in disabling face detection/recognition against a specific algorithm, there are no guarantees that adversarial attacks are effective for protecting people’s privacy, when employed against automated classification systems to this end.
On the other hand, adversarial attacks could potentially be used for malicious purposes against classification systems in sensitive applications, e.g., bio-metrics, forensics, spam/fault detection systems, or even copyright protection systems. Classification systems that are not robust against adversarial attacks may be rendered unreliable for real-world deployment. In order to measure the potential threat, adversarial attacks could be employed by the classification system engineer as a measure to intuitively expose innate classification model weaknesses e.g., over-fitting, since their application reveals the decision noise tolerance, which is directly related to the amount of additive noise required to result in the misclassification decision. However, in order to be protected against adversaries, novel defense mechanisms should be employed in such classification systems that not only hinder adversarial example crafting, but also to increase the model’s tolerance to noise and/or at least detect/prevent adversarial attacks as last resort option. To this end, we argue that one-class classification methods such as the Support Vector Data Description (Tax & Duin, 2004) can be used as an additional mechanism to verify if the input samples belong to one of the training classes.
Motivated by the potential applications, we propose an extension of the use of adversarial attacks in order to fool deep neural network classifiers in a privacy preserving manner, along with a novel defense mechanism to counter them. Two concepts are introduced, namely the K-Anonymity-inspired Adversarial Attack (K-A) and the Multiple Support Vector Data Description (Tax & Duin, 2004) (M-SVDD) Defense. The novel contributions of this work can be summarized as follows:
- •
A novel adversarial attack optimization problem is proposed that exploits and extends well-known adversarial attack methodologies, by modifying the optimization conditions for generating the adversarial examples. The proposed optimization problem is inspired by K-Anonymity principles, assuring that the initial identities of the crafted adversarial examples are not only misclassified by the neural network decision function, but are uniformly spread along different ranked output positions.
- •
In order to minimize the introduced perturbation by our adversarial attack, a visual similarity loss is introduced, that guides the adversarial attack towards image pixel value modifications having minimal impact on the perceived image quality. The CW-SSIM loss (Wang, Bovik, Sheikh, & Simoncelli, 2004) is employed to this end.
- •
A novel deep neural layer composed of a number of novel deep non-linear one-class classifiers (SVDD layer), equal to the number of classes supported by the model to be protected, is proposed as an adversarial defense mechanism. The parameters of the SVDD layer are trained by exploiting novel loss functions inspired by the Support Vector Data Description (Tax & Duin, 2004). The SVDD layer is thereby used to replace the standard linear classification layer of a pre-trained reference deep neural architecture, introducing non-linearity to the classifier decision function.
- •
In black-box attack settings, the proposed defense mechanism acts as input verification mechanism, i.e., ensures that if an input data vector does not belong to any of the training classes (i.e., is classified as outlier by every SVDD classifier), it is an adversarial example. The proposed defense mechanism can merely be a post-processing step after model inference, and does not hinder the application of other defense mechanisms at the same time.
Section snippets
Background information and related work
Let be a general data sample (e.g., a facial image for face recognition) having a discrete ground truth label representing one of the classes corresponding to e.g., facial image identities. Also let a neural network architecture consisting of layers, having a trainable parameter set , where contains the th layer weights and also let a classifier decision function that maps the input samples to decision values, corresponding to each class. The sample
Privacy protection against deep neural networks
Let be an image classification domain that a neural network classifier with trainable parameters is very knowledgeable of, such that its decision function is able to map samples originating from this domain to their corresponding label space. That is, for any given sample , the neural network classifier is able to recover the label by its decision function e.g., . Adversarial attacks typically generate the perturbation as a mapping to a space of similar
Multiple SVDD defense
The proposed M-SVDD-D method assumes having an undefended pretrained multi-class deep neural network architecture with a parameter set , consisting of layers, where its final layer involves inference with a linear multiclass classifier layer, supporting classes. Our defense strategy involves creating a modified architecture , by replacing this linear classifier layer with non-linear one-class classifiers, based on the SVDD method (Tax & Duin, 2004). Each one-class classifier acts
Experiments
In order to prove the concepts and evaluate the performance of the proposed methods, we have performed 2 sets of experiments, corresponding to the adversarial attack and adversarial defense scenarios. We have employed 4 publicly available image classification datasets, namely the MNIST (digit classification) (LeCun, Bottou, Bengio, & Haffner, 1998), CIFAR-10 (object recognition) (Krizhevsky & Hinton, 2009), Yale (face recognition) (Georghiades, Belhumeur, & Kriegman, 2001) and GTS (Traffic Sign
Conclusion
In this paper, an adversarial attack method was developed, that supports and respects the proposed K-Anonymity-inspired requirements. In addition, the defense mechanism proposed in this paper can be used to: (a) increase adversarial attack failure rates, (b) increase the noise energy required to be added to standard examples in order to be deceived, (c) prevent adversarial attacks. Moreover, it was shown that the optimization problem of the SVDD classifier can be effectively solved in its
Acknowledgment
This work has received funding from the European Union’s European Union Horizon 2020 research and innovation programme under grant agreement no 731667 (MULTIDRONE). This publication reflects only the authors’ views. The European Commission is not responsible for any use that may be made of the information it contains.
References (55)
- et al.
Semi-supervised subclass support vector data description for image and video classification
Neurocomputing
(2018) - et al.
Multi-label convolutional neural network based pedestrian attribute classification
Image and Vision Computing
(2017) - et al.
Threat of adversarial attacks on deep learning in computer vision: A survey
IEEE Access
(2018) - et al.
On the robustness of the CVPR 2018 white-box adversarial example defenses
(2018) - et al.
Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples
- et al.
Social media analytics: a survey of techniques, tools and platforms
AI & Society
(2015) - et al.
Data and structural k-anonymity in social networks
- et al.
Towards evaluating the robustness of neural networks
- et al.
Adversarial examples are not easily detected: Bypassing ten detection methods
- et al.
Adversarial attacks and defences: A survey
(2018)
Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression
Database anonymization: privacy models, data utility, and microaggregation-based inter-model connections
On the Nyström method for approximating a gram matrix for improved kernel-based learning
Journal of Machine Learning Research
Multilabel classification via calibrated label ranking
Machine Learning
From few to many: Illumination cone models for face recognition under variable lighting and pose
IEEE Transactions on Pattern Analysis and Machine Intelligence
Adam: A method for stochastic optimization
Learning multiple layers of features from tiny imagesTech. rep.
Gradient-based learning applied to document recognition
Proceedings of the IEEE
Protecting privacy in shared photos via adversarial examples based stealth
Security and Communication Networks
Cited by (13)
A secure visual framework for multi-index protection evaluation in networks
2023, Digital Communications and NetworksHyperspherical class prototypes for adversarial robustness
2022, Pattern RecognitionCitation Excerpt :Extensions of the vanilla method include training from negative [31] or unlabeled examples [32]. Finally, one-class classification criteria inspired by this method have been successfully applied in the deep learning case [8,9], as well. In the following, we show how one-class classification criteria can now be applied in the multi-class classification case.
On the robustness of skeleton detection against adversarial attacks
2020, Neural NetworksCitation Excerpt :Szegedy et al. (2013) first discovered adversarial examples for deep neural networks in the image classification task. Following that, many works (Athalye, Carlini, & Wagner, 2018; Dong et al., 2018; Kurakin, Goodfellow, & Bengio, 2016; Moosavi-Dezfooli et al., 2016; Mygdalis, Tefas, & Pitas, 2020; Oregi, Ser, Pérez, & Lozano, 2020; Vidnerová & Neruda, 2020) have demonstrated that deep classification networks are vulnerable to kinds of adversarial attacks. Goodfellow, Shlens, and Szegedy (2015) present a fast gradient sign approach (FGSM) by utilizing the sign of gradients from neural networks.
Boosting adversarial robustness via feature refinement, suppression, and alignment
2024, Complex and Intelligent SystemsWoS Driven Bibliometric Analysis on Genetic Disease Prediction Using Artificial Intelligence
2023, Proceedings - 2023 International Conference on Advanced Computing and Communication Technologies, ICACCTech 2023