当前位置: X-MOL 学术ACM Comput. Surv. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine Learning for Detecting Data Exfiltration
ACM Computing Surveys ( IF 23.8 ) Pub Date : 2021-05-08 , DOI: 10.1145/3442181
Bushra Sabir 1 , Faheem Ullah 2 , M. Ali Babar 3 , Raj Gaire 4
Affiliation  

Context : Research at the intersection of cybersecurity, Machine Learning (ML), and Software Engineering (SE) has recently taken significant steps in proposing countermeasures for detecting sophisticated data exfiltration attacks. It is important to systematically review and synthesize the ML-based data exfiltration countermeasures for building a body of knowledge on this important topic. Objective : This article aims at systematically reviewing ML-based data exfiltration countermeasures to identify and classify ML approaches, feature engineering techniques, evaluation datasets, and performance metrics used for these countermeasures. This review also aims at identifying gaps in research on ML-based data exfiltration countermeasures. Method : We used Systematic Literature Review (SLR) method to select and review 92 papers. Results : The review has enabled us to: (a) classify the ML approaches used in the countermeasures into data-driven, and behavior-driven approaches; (b) categorize features into six types: behavioral, content-based, statistical, syntactical, spatial, and temporal; (c) classify the evaluation datasets into simulated, synthesized, and real datasets; and (d) identify 11 performance measures used by these studies. Conclusion : We conclude that: (i) The integration of data-driven and behavior-driven approaches should be explored; (ii) There is a need of developing high quality and large size evaluation datasets; (iii) Incremental ML model training should be incorporated in countermeasures; (iv) Resilience to adversarial learning should be considered and explored during the development of countermeasures to avoid poisoning attacks; and (v) The use of automated feature engineering should be encouraged for efficiently detecting data exfiltration attacks.

中文翻译:

用于检测数据泄露的机器学习

语境:网络安全、机器学习 (ML) 和软件工程 (SE) 交叉领域的研究最近采取了重要步骤,提出了检测复杂数据泄露攻击的对策。系统地回顾和综合基于机器学习的数据泄露对策对于建立关于这一重要主题的知识体系非常重要。客观的:本文旨在系统地回顾基于 ML 的数据泄露对策,以识别和分类 ML 方法、特征工程技术、评估数据集以及用于这些对策的性能指标。本综述还旨在确定基于 ML 的数据泄露对策研究中的空白。方法: 我们使用系统文献综述 (SLR) 方法筛选和审查了 92 篇论文。结果:审查使我们能够:(a)将对策中使用的机器学习方法分为数据驱动和行为驱动的方法;(b) 将特征分为六种类型:行为、基于内容、统计、句法、空间和时间;(c) 将评估数据集分类为模拟数据集、合成数据集和真实数据集;(d) 确定这些研究使用的 11 项绩效衡量标准。结论:我们的结论是: (i) 应探索数据驱动和行为驱动方法的整合;(ii) 需要开发高质量和大规模的评估数据集;(iii) 增量 ML 模型训练应纳入对策;(iv) 在制定避免中毒攻击的对策期间,应考虑和探索对抗性学习的弹性;(v) 应鼓励使用自动化特征工程来有效检测数据泄露攻击。
更新日期:2021-05-08
down
wechat
bug