Abstract
We use distributionally-robust optimization for machine learning to mitigate the effect of data poisoning attacks. We provide performance guarantees for the trained model on the original data (not including the poison records) by training the model for the worst-case distribution on a neighbourhood around the empirical distribution (extracted from the training dataset corrupted by a poisoning attack) defined using the Wasserstein distance. We relax the distributionally-robust machine learning problem by finding an upper bound for the worst-case fitness based on the empirical sampled-averaged fitness and the Lipschitz-constant of the fitness function (on the data for given model parameters) as regularizer. For regression models, we prove that this regularizer is equal to the dual norm of the model parameters.
Similar content being viewed by others
References
Vorobeychik Y, Kantarcioglu M (2018) Adversarial machine learning. Synth Lect Artif Intell Machine Learn 12(3):1–169
Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise, In Asian Conference on Machine Learning, pp. 97-112
Biggio B, Nelson B, Laskov P (2012) Poisoning attacks against support vector machines, In 29th International Conference on Machine Learning, pp. 1807-1814
Demontis A, Biggio B, Fumera G, Giacinto, G, Roli F (2017) Infinity-norm support vector machines against adversarial label contamination., In ITASEC, pp. 106-115
Esfahani PM, Kuhn D (2018) Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Mathemat Program 171(1–2):115–166
Kantorovich LV, Rubinshtein G (1958) On a space of totally additive functions. Vestn Lening Univ 13:52–59
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decision Support Syst 47(4):547–553
Harrison D Jr, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manage 5(1):81–102
Kohavi R (1966) Scaling up the accuracy of Naive-Bayes classifiers: A decision-tree hybrid, In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 202-207
Valiant LG (1985) Learning disjunction of conjunctions, In International Joint Conferences on Artificial Intelligence (IJCAI), pp. 560-566
Kearns M, Li M (1993) Learning in the presence of malicious errors. SIAM J Comput 22(4):807–837
Bshouty NH, Eiron N, Kushilevitz E (2002) Pac learning with nasty noise. Theoret Comput Sci 288(2):255–275
Kalai AT, Klivans AR, Mansour Y, Servedio RA (2008) Agnostically learning halfspaces. SIAM J Comput 37(6):1777–1805
Servedio RA (2003) Smooth boosting and learning with malicious noise. J Mach Learn Res 4:633–648
Steinhardt J, Koh PWW, Liang PS (2017) Certified defenses for data poisoning attacks, In Advances in Neural Information Processing Systems, pp. 3517-3529
Klivans AR, Long PM, Servedio RA (2009) Learning halfspaces with malicious noise. J Mach Learn Res 10:2715–2740
Cretu GF, Stavrou A, Locasto ME, Stolfo SJ, Keromytis AD (2008) Casting out demons: Sanitizing training data for anomaly sensors, In 2008 IEEE Symposium on Security and Privacy (SP 2008), pp. 81-95
Barreno M, Nelson B, Joseph AD, Tygar JD (2010) The security of machine learning. Mach Learn 81(2):121–148
Liu C, Li B, Vorobeychik Y, Oprea A (2017) Robust linear regression against training data poisoning, In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 91-102, ACM
Li B, Vorobeychik Y (2014) Feature cross-substitution in adversarial classification, In Advances in Neural Information Processing Systems, pp. 2087-2095
Li B, Vorobeychik Y (2018) Evasion-robust classification on binary domains. ACM Transact Knowledge Discover Data (TKDD) 12(4):50
Frogner C, Zhang C, Mobahi H, Araya M, Poggio TA (2015) Learning with a Wasserstein loss, In Advances in Neural Information Processing Systems, pp. 2053-2061
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks, In International Conference on Machine Learning, pp. 214-223
Blanchet J, Kang Y, Murthy K (2019) Robust Wasserstein profile inference and applications to machine learning. J Appl Probab 56(3):830–857
Kuhn D, Esfahani PM, Nguyen VA, Shafieezadeh-Abadeh S (2019) Wasserstein distributionally robust optimization: Theory and applications in machine learning, In Operations Research & Management Science in the Age of Analytics, pp. 130-166, INFORMS
Farokhi F (2019) A game-theoretic approach to adversarial linear support vector classification, Preprint:arXiv:1906.09721
Zhang R, Zhu Q (2017) A game-theoretic defense against data poisoning attacks in distributed support vector machines, In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 4582-4587
Ou Y, Samavi R (2019) Mixed strategy game model against data poisoning attacks. Preprint:arXiv:1906.02872
Hanasusanto GA, Kuhn D, Wiesemann W (2016) A comment on computational complexity of stochastic programming problems. Mathemat Program 159(1–2):557–569
Brownlees C, Joly E, Lugosi G et al (2015) Empirical risk minimization for heavy-tailed losses. Annals Statist 43(6):2507–2536
Catoni O (2012) Challenging the empirical mean and empirical variance: A deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques 48:1148–1185
Fournier N, Guillin A (2015) On the rate of convergence in Wasserstein distance of the empirical measure. Probab Theory Related Fields 162(3–4):707–738
Pflug GC, Pichler A (2014) Multistage Stochastic Optimization. Springer Series in Operations Research and Financial Engineering, Springer International Publishing
Prügel-Bennett A (2020) The Probability Companion for Engineering and Computer Science. Cambridge University Press, Cambridge
Reddy BD, Reddy DD, Marsden JE, Sirovich L, Golubitsky M, Jager W (1998) Introductory Functional Analysis: With Applications to Boundary Value Problems and Finite Elements. Introductory Functional Analysis Series, Springer, Newyork
Hall WS, Newell ML (1979) The mean value theorem for vector valued functions: A simple proof. Mathemat Magazine 52(3):157–158
Aziznejad S, Gupta H, Campos J, Unser M (2020) Deep neural networks with trainable activations and controlled lipschitz constant. IEEE Transact Sig Process 68:4688–4699
Acknowledgements
Funding was provided by University of Melbourne.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Farokhi, F. Why Does Regularization Help with Mitigating Poisoning Attacks?. Neural Process Lett 53, 2933–2945 (2021). https://doi.org/10.1007/s11063-021-10539-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10539-1