Elsevier

Neural Networks

Volume 124, April 2020, Pages 296-307
Neural Networks

K-Anonymity inspired adversarial attack and multiple one-class classification defense

https://doi.org/10.1016/j.neunet.2020.01.015Get rights and content

Abstract

A novel adversarial attack methodology for fooling deep neural network classifiers in image classification tasks is proposed, along with a novel defense mechanism to counter such attacks. Two concepts are introduced, namely the K-Anonymity-inspired Adversarial Attack (K-A3) and the Multiple Support Vector Data Description Defense (M-SVDD-D). The proposed K-A3 introduces novel optimization criteria to standard adversarial attack methodologies, inspired by the K-Anonymity principles. Its generated adversarial examples are not only misclassified by the neural network classifier, but are uniformly spread along K different ranked output positions. The proposed M-SVDD-D consists of a deep neural architecture layer consisting of multiple non-linear one-class classifiers based on Support Vector Data Description that can be used to replace the final linear classification layer of a deep neural architecture, and an additional class verification mechanism. Its application decreases the effectiveness of adversarial attacks, by increasing the noise energy required to deceive the protected model, attributed to the introduced non-linearity. In addition, M-SVDD-D can be used to prevent adversarial attacks in black-box attack settings.

Introduction

In image classification tasks (e.g., face/object recognition), the term adversarial examples refers to crafted images that appear to the human eye almost imperceptibly similar to the training samples, while being misclassified by the respective image classifier. The attempt of crafting such examples to this end is the so-called adversarial attack. Many classification models, including the ones based on Convolutional Neural Networks (CNN), have been found to be vulnerable to adversarial attacks (Goodfellow et al., 2015, Nguyen et al., 2015, Szegedy et al., 2013). Furthermore, recent studies (Moosavi-Dezfooli et al., 2017, Papernot, McDaniel, Goodfellow, 2016, Zhou et al., 2018) have shown that adversarial attacks have the property of transferability, i.e., carefully crafted adversarial examples may deceive various classification methods at the same time, ranging from similar deep architectures to even totally different classification methods, such as Support Vector Machines or Random Forests.

The research community has been actively developing adversarial attack methodologies over the past few years, as well as methodologies to anticipate these attacks. Different types of adversarial attacks are specified in the literature, depending on the level of information available to the adversary prior to the attack. In most cases, a white-box attack is assumed, i.e., the adversary has full knowledge about the model architecture to be deceived, including access to the values of the respective model weights. Therefore, the adversary is allowed to form queries to the model in order to backpropagate gradients for the given inputs by employing appropriate loss functions. In the black-box attack case, it is assumed that the adversary has limited or no information about the model architecture, other than its output classification labels. White-box attacks may be applied to attack model architectures unknown to the adversary, by employing intermediate/reference architectures (known to the adversary) and by exploiting the property of transferability (Papernot, McDaniel, Goodfellow, 2016). The design of adversarial defense methods to repulse these attacks seems to be a lot more challenging than initially anticipated (Carlini & Wagner, 2017a). Recent adversarial defense methods were based on obfuscating the model gradients (Papernot, McDaniel, Wu, Jha, Swami, 2016) for the given inputs, repulsing a number of known adversarial attack methodologies in white-box adversarial attack setups. However, it was later found that the defenses relying on obfuscating gradients are significantly less effective against newer and stronger attacks, or against transferability attacks generated by employing similar undefended architectures and can nowadays be easily overcome by the adversary (Athalye, Carlini, & Wagner, 2018).

In this paper, we consider that adversarial attacks should not only be viewed in a negative way as methods for fooling deep neural networks, as they have been used in other applications, notably in protecting private data automated analysis by recognition systems, that are typically used by service providers in social media (Batrinca & Treleaven, 2015). For example, adversarial attacks have been employed to disable known automatic face detection/recognition algorithms applied on visual data uploaded by social media users (Liu, Zhang, & Yu, 2017), without severely compromising image quality (Oh, Fritz, & Schiele, 2017), while at the same time, not hiding the person identities to human viewers. Moreover, adversarial attack methods could potentially be used to protect data captured from publicly installed cameras or even IoT sensors (e.g., UAV/surveillance/car cameras). However, to the best of our knowledge, unlike standard privacy protection methods (Sikiric, Hrkac, Zoran Kalafatic, et al., 2017), adversarial attack methodologies do not incorporate privacy protection-related constraints in their optimization process, therefore, even if they are successful in disabling face detection/recognition against a specific algorithm, there are no guarantees that adversarial attacks are effective for protecting people’s privacy, when employed against automated classification systems to this end.

On the other hand, adversarial attacks could potentially be used for malicious purposes against classification systems in sensitive applications, e.g., bio-metrics, forensics, spam/fault detection systems, or even copyright protection systems. Classification systems that are not robust against adversarial attacks may be rendered unreliable for real-world deployment. In order to measure the potential threat, adversarial attacks could be employed by the classification system engineer as a measure to intuitively expose innate classification model weaknesses e.g., over-fitting, since their application reveals the decision noise tolerance, which is directly related to the amount of additive noise required to result in the misclassification decision. However, in order to be protected against adversaries, novel defense mechanisms should be employed in such classification systems that not only hinder adversarial example crafting, but also to increase the model’s tolerance to noise and/or at least detect/prevent adversarial attacks as last resort option. To this end, we argue that one-class classification methods such as the Support Vector Data Description (Tax & Duin, 2004) can be used as an additional mechanism to verify if the input samples belong to one of the training classes.

Motivated by the potential applications, we propose an extension of the use of adversarial attacks in order to fool deep neural network classifiers in a privacy preserving manner, along with a novel defense mechanism to counter them. Two concepts are introduced, namely the K-Anonymity-inspired Adversarial Attack (K-A3) and the Multiple Support Vector Data Description (Tax & Duin, 2004) (M-SVDD) Defense. The novel contributions of this work can be summarized as follows:

  • A novel adversarial attack optimization problem is proposed that exploits and extends well-known adversarial attack methodologies, by modifying the optimization conditions for generating the adversarial examples. The proposed optimization problem is inspired by K-Anonymity principles, assuring that the initial identities of the crafted adversarial examples are not only misclassified by the neural network decision function, but are uniformly spread along K different ranked output positions.

  • In order to minimize the introduced perturbation by our adversarial attack, a visual similarity loss is introduced, that guides the adversarial attack towards image pixel value modifications having minimal impact on the perceived image quality. The CW-SSIM loss (Wang, Bovik, Sheikh, & Simoncelli, 2004) is employed to this end.

  • A novel deep neural layer composed of a number of novel deep non-linear one-class classifiers (SVDD layer), equal to the number of classes supported by the model to be protected, is proposed as an adversarial defense mechanism. The parameters of the SVDD layer are trained by exploiting novel loss functions inspired by the Support Vector Data Description (Tax & Duin, 2004). The SVDD layer is thereby used to replace the standard linear classification layer of a pre-trained reference deep neural architecture, introducing non-linearity to the classifier decision function.

  • In black-box attack settings, the proposed defense mechanism acts as input verification mechanism, i.e., ensures that if an input data vector does not belong to any of the training classes (i.e., is classified as outlier by every SVDD classifier), it is an adversarial example. The proposed defense mechanism can merely be a post-processing step after model inference, and does not hinder the application of other defense mechanisms at the same time.

Section snippets

Background information and related work

Let xRD be a general data sample (e.g., a facial image for face recognition) having a discrete ground truth label yY={1,,C} representing one of the C classes corresponding to e.g., facial image identities. Also let a neural network architecture consisting of L layers, having a trainable parameter set W={Wi}i=1L, where Wi contains the ith layer weights and also let a classifier decision function f:RDRC that maps the input samples to decision values, corresponding to each class. The sample x

Privacy protection against deep neural networks

Let S={X,Y} be an image classification domain that a neural network classifier with trainable parameters W is very knowledgeable of, such that its decision function is able to map samples originating from this domain to their corresponding label space. That is, for any given sample xX, the neural network classifier is able to recover the label yY by its decision function e.g., argmaxf(x;W)=y. Adversarial attacks typically generate the perturbation as a mapping to a space of similar

Multiple SVDD defense

The proposed M-SVDD-D method assumes having an undefended pretrained multi-class deep neural network architecture with a parameter set W, consisting of L layers, where its final layer WLRC involves inference with a linear multiclass classifier layer, supporting C classes. Our defense strategy involves creating a modified architecture W̃, by replacing this linear classifier layer with C non-linear one-class classifiers, based on the SVDD method (Tax & Duin, 2004). Each one-class classifier acts

Experiments

In order to prove the concepts and evaluate the performance of the proposed methods, we have performed 2 sets of experiments, corresponding to the adversarial attack and adversarial defense scenarios. We have employed 4 publicly available image classification datasets, namely the MNIST (digit classification) (LeCun, Bottou, Bengio, & Haffner, 1998), CIFAR-10 (object recognition) (Krizhevsky & Hinton, 2009), Yale (face recognition) (Georghiades, Belhumeur, & Kriegman, 2001) and GTS (Traffic Sign

Conclusion

In this paper, an adversarial attack method was developed, that supports and respects the proposed K-Anonymity-inspired requirements. In addition, the defense mechanism proposed in this paper can be used to: (a) increase adversarial attack failure rates, (b) increase the noise energy required to be added to standard examples in order to be deceived, (c) prevent adversarial attacks. Moreover, it was shown that the optimization problem of the SVDD classifier can be effectively solved in its

Acknowledgment

This work has received funding from the European Union’s European Union Horizon 2020 research and innovation programme under grant agreement no 731667 (MULTIDRONE). This publication reflects only the authors’ views. The European Commission is not responsible for any use that may be made of the information it contains.

References (55)

  • MygdalisV. et al.

    Semi-supervised subclass support vector data description for image and video classification

    Neurocomputing

    (2018)
  • ZhuJ. et al.

    Multi-label convolutional neural network based pedestrian attribute classification

    Image and Vision Computing

    (2017)
  • AkhtarN. et al.

    Threat of adversarial attacks on deep learning in computer vision: A survey

    IEEE Access

    (2018)
  • AthalyeA. et al.

    On the robustness of the CVPR 2018 white-box adversarial example defenses

    (2018)
  • AthalyeA. et al.

    Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

  • BatrincaB. et al.

    Social media analytics: a survey of techniques, tools and platforms

    AI & Society

    (2015)
  • CampanA. et al.

    Data and structural k-anonymity in social networks

  • CarliniN. et al.

    Towards evaluating the robustness of neural networks

  • CarliniN. et al.

    Adversarial examples are not easily detected: Bypassing ten detection methods

  • ChakrabortyA. et al.

    Adversarial attacks and defences: A survey

    (2018)
  • DasN. et al.

    Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression

    (2017)
  • Dhillon, G. S., Azizzadenesheli, K., Bernstein, J. D., Kossaifi, J., Khanna, A., Lipton, Z. C., & Anandkumar, A....
  • Domingo-FerrerJ. et al.

    Database anonymization: privacy models, data utility, and microaggregation-based inter-model connections

    (2016)
  • DrineasP. et al.

    On the Nyström method for approximating a gram matrix for improved kernel-based learning

    Journal of Machine Learning Research

    (2005)
  • FürnkranzJ. et al.

    Multilabel classification via calibrated label ranking

    Machine Learning

    (2008)
  • GeorghiadesA.S. et al.

    From few to many: Illumination cone models for face recognition under variable lighting and pose

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • Goodfellow, I., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In International...
  • Guo, C., Rana, M., Cisse, M., & van der Maaten, L. (2018). Countering adversarial images using input transformations....
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE...
  • Hinton, G. E., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. In Deep learning and...
  • KingmaD.P. et al.

    Adam: A method for stochastic optimization

    (2014)
  • KrizhevskyA. et al.

    Learning multiple layers of features from tiny imagesTech. rep.

    (2009)
  • Kurakin, A., Goodfellow, I. J., & Bengio, S. (2017). Adversarial machine learning at scale. In International conference...
  • LeCunY. et al.

    Gradient-based learning applied to document recognition

    Proceedings of the IEEE

    (1998)
  • Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X., & Zhu, J. (2018). Defense against adversarial attacks using high-level...
  • LiuY. et al.

    Protecting privacy in shared photos via adversarial examples based stealth

    Security and Communication Networks

    (2017)
  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to...
  • Cited by (13)

    • Hyperspherical class prototypes for adversarial robustness

      2022, Pattern Recognition
      Citation Excerpt :

      Extensions of the vanilla method include training from negative [31] or unlabeled examples [32]. Finally, one-class classification criteria inspired by this method have been successfully applied in the deep learning case [8,9], as well. In the following, we show how one-class classification criteria can now be applied in the multi-class classification case.

    • On the robustness of skeleton detection against adversarial attacks

      2020, Neural Networks
      Citation Excerpt :

      Szegedy et al. (2013) first discovered adversarial examples for deep neural networks in the image classification task. Following that, many works (Athalye, Carlini, & Wagner, 2018; Dong et al., 2018; Kurakin, Goodfellow, & Bengio, 2016; Moosavi-Dezfooli et al., 2016; Mygdalis, Tefas, & Pitas, 2020; Oregi, Ser, Pérez, & Lozano, 2020; Vidnerová & Neruda, 2020) have demonstrated that deep classification networks are vulnerable to kinds of adversarial attacks. Goodfellow, Shlens, and Szegedy (2015) present a fast gradient sign approach (FGSM) by utilizing the sign of gradients from neural networks.

    • WoS Driven Bibliometric Analysis on Genetic Disease Prediction Using Artificial Intelligence

      2023, Proceedings - 2023 International Conference on Advanced Computing and Communication Technologies, ICACCTech 2023
    View all citing articles on Scopus
    View full text