当前位置: X-MOL 学术arXiv.cs.CG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Defending SVMs against Poisoning Attacks: the Hardness and DBSCAN Approach
arXiv - CS - Computational Geometry Pub Date : 2020-06-14 , DOI: arxiv-2006.07757
Hu Ding, Fan Yang, Jiawei Huang

Adversarial machine learning has attracted a great amount of attention in recent years. In a poisoning attack, the adversary can inject a small number of specially crafted samples into the training data which make the decision boundary severely deviate and cause unexpected misclassification. Due to the great importance and popular use of support vector machines (SVM), we consider defending SVM against poisoning attacks in this paper. We study two commonly used strategies for defending: designing robust SVM algorithms and data sanitization. Though several robust SVM algorithms have been proposed before, most of them either are in lack of adversarial-resilience, or rely on strong assumptions about the data distribution or the attacker's behavior. Moreover, the research on their complexities is still quite limited. We are the first, to the best of our knowledge, to prove that even the simplest hard-margin one-class SVM with outliers problem is NP-complete, and has no fully PTAS unless P$=$NP (that means it is hard to achieve an even approximate algorithm). For the data sanitization defense, we link it to the intrinsic dimensionality of data; in particular, we provide a sampling theorem in doubling metrics for explaining the effectiveness of DBSCAN (as a density-based outlier removal method) for defending against poisoning attacks. In our empirical experiments, we compare several defenses including the DBSCAN and robust SVM methods, and investigate the influences from the intrinsic dimensionality and data density to their performances.

中文翻译:

保护 SVM 免受中毒攻击:硬度和 DBSCAN 方法

近年来,对抗性机器学习引起了广泛关注。在中毒攻击中,攻击者可以在训练数据中注入少量特制的样本,使决策边界严重偏离,导致意外误分类。由于支持向量机 (SVM) 的重要性和广泛使用,我们考虑在本文中保护 SVM 免受中毒攻击。我们研究了两种常用的防御策略:设计稳健的 SVM 算法和数据清理。尽管之前已经提出了几种强大的 SVM 算法,但它们中的大多数要么缺乏对抗性,要么依赖于对数据分布或攻击者行为的强假设。此外,对其复杂性的研究仍然相当有限。我们是第一,据我们所知,要证明即使是最简单的具有异常值问题的硬边一类 SVM 也是 NP 完全的,并且没有完全的 PTAS,除非 P$=$NP(这意味着很难实现均匀近似算法)。对于数据清理防御,我们将其与数据的内在维度联系起来;特别是,我们提供了倍增度量的采样定理,以解释 DBSCAN(作为基于密度的异常值去除方法)在防御中毒攻击方面的有效性。在我们的实证实验中,我们比较了包括 DBSCAN 和鲁棒 SVM 方法在内的几种防御方法,并研究了内在维度和数据密度对其性能的影响。并且没有完全的 PTAS,除非 P$=$NP(这意味着很难实现一个近似算法)。对于数据清理防御,我们将其与数据的内在维度联系起来;特别是,我们提供了倍增度量的采样定理,以解释 DBSCAN(作为基于密度的异常值去除方法)在防御中毒攻击方面的有效性。在我们的实证实验中,我们比较了包括 DBSCAN 和鲁棒 SVM 方法在内的几种防御方法,并研究了内在维度和数据密度对其性能的影响。并且没有完全的 PTAS,除非 P$=$NP(这意味着很难实现一个近似算法)。对于数据清理防御,我们将其与数据的内在维度联系起来;特别是,我们提供了倍增度量的采样定理,以解释 DBSCAN(作为基于密度的异常值去除方法)在防御中毒攻击方面的有效性。在我们的实证实验中,我们比较了包括 DBSCAN 和鲁棒 SVM 方法在内的几种防御方法,并研究了内在维度和数据密度对其性能的影响。我们提供了加倍度量的采样定理,以解释 DBSCAN(作为基于密度的异常值去除方法)在防御中毒攻击方面的有效性。在我们的实证实验中,我们比较了包括 DBSCAN 和鲁棒 SVM 方法在内的几种防御方法,并研究了内在维度和数据密度对其性能的影响。我们提供了加倍度量的采样定理,以解释 DBSCAN(作为基于密度的异常值去除方法)在防御中毒攻击方面的有效性。在我们的实证实验中,我们比较了包括 DBSCAN 和鲁棒 SVM 方法在内的几种防御方法,并研究了内在维度和数据密度对其性能的影响。
更新日期:2020-09-17
down
wechat
bug