Deep neural rejection against adversarial examples,EURASIP Journal on Information Security

当前位置： X-MOL 学术 › EURASIP J. Info. Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep neural rejection against adversarial examples
EURASIP Journal on Information Security ( IF 2.5 ) Pub Date : 2020-04-07 , DOI: 10.1186/s13635-020-00105-y
Angelo Sotgiu , Ambra Demontis , Marco Melis , Battista Biggio , Giorgio Fumera , Xiaoyi Feng , Fabio Roli

Despite the impressive performances reported by deep neural networks in different application domains, they remain largely vulnerable to adversarial examples, i.e., input samples that are carefully perturbed to cause misclassification at test time. In this work, we propose a deep neural rejection mechanism to detect adversarial examples, based on the idea of rejecting samples that exhibit anomalous feature representations at different network layers. With respect to competing approaches, our method does not require generating adversarial examples at training time, and it is less computationally demanding. To properly evaluate our method, we define an adaptive white-box attack that is aware of the defense mechanism and aims to bypass it. Under this worst-case setting, we empirically show that our approach outperforms previously proposed methods that detect adversarial examples by only analyzing the feature representation provided by the output network layer.

中文翻译：

针对对抗示例的深度神经排斥

尽管深层神经网络在不同的应用领域中报告了令人印象深刻的性能，但它们仍然在很大程度上容易受到对抗性示例的影响，例如，在测试时会仔细扰乱输入分类的输入样本。在这项工作中，我们提出了一种基于神经网络的拒绝机制来检测对抗性示例，该思想基于拒绝在不同网络层表现出异常特征表示的样本的想法。对于竞争性方法，我们的方法不需要在训练时生成对抗性示例，并且对计算的要求较低。为了正确评估我们的方法，我们定义了一种自适应白盒攻击，该攻击知道防御机制并旨在绕过它。在这种最坏的情况下

更新日期：2020-04-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文