Hybrid Consensus Pruning of Ensemble Classifiers for Big Data Malware Detection,IEEE Transactions on Cloud Computing

当前位置： X-MOL 学术 › IEEE Trans. Cloud Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Hybrid Consensus Pruning of Ensemble Classifiers for Big Data Malware Detection
IEEE Transactions on Cloud Computing ( IF 5.3 ) Pub Date : 2020-04-01 , DOI: 10.1109/tcc.2015.2481378
Jemal H. Abawajy , Morshed Chowdhury , Andrei Kelarev

One of the major challenges for safeguarding the security of big data in the cloud is how to detect and prevent malicious software (malware). Despite of the fact that security and privacy are critical issues in big data, more research needs to be done in this area. As malware can affect the reliability of the data and subsequently the reputation of the system, it is critical to detect and remove malware from a system as early as possible. Recently, ensembles that combine a set of classifiers have been proposed as an efficient approach for malware detection. Unfortunately, the size, memory and processing requirements as well as the high cost of data transfer during training and operation make large ensemble classifiers unsuitable for big data in the cloud. To address this problem, we propose a new advanced ensemble pruning method, Hybrid Consensus Pruning (HCP), which is the first pruning algorithm that employs a fast consensus function to combine several classifier classes into one scheme. To test the effectiveness of the HCP method, we conducted experiments comparing its performance with Ensemble Pruning via Individual Contribution ordering (EPIC), Directed Hill Climbing Ensemble Pruning (DHCEP) and K-Means Pruning approaches for pruning very large ensemble classifiers for malware detection. The results of the experiments show that HCP achieved better results by producing better ensemble classifiers as compared to those created by EPIC, DHCEP and K-Means Pruning.

中文翻译：

用于大数据恶意软件检测的集成分类器的混合共识修剪

保障云端大数据安全的主要挑战之一是如何检测和防范恶意软件（malware）。尽管安全和隐私是大数据中的关键问题，但在这方面还需要做更多的研究。由于恶意软件会影响数据的可靠性，进而影响系统的声誉，因此尽早检测并从系统中删除恶意软件至关重要。最近，已经提出了结合一组分类器的集成作为恶意软件检测的有效方法。不幸的是，大小、内存和处理要求以及训练和操作期间数据传输的高成本使得大型集成分类器不适合云中的大数据。为了解决这个问题，我们提出了一种新的高级集成剪枝方法，混合共识剪枝 (HCP)，这是第一个使用快速共识函数将多个分类器类别组合成一个方案的剪枝算法。为了测试 HCP 方法的有效性，我们进行了实验，将其性能与通过个人贡献排序 (EPIC)、定向爬山集成剪枝 (DHCEP) 和 K-Means 剪枝方法的性能进行比较，用于修剪非常大的集成分类器以进行恶意软件检测。实验结果表明，与 EPIC、DHCEP 和 K-Means 剪枝创建的分类器相比，HCP 通过产生更好的集成分类器获得了更好的结果。我们进行了实验，将其性能与通过个人贡献排序 (EPIC)、定向爬山集成剪枝 (DHCEP) 和 K-Means 剪枝方法的性能进行比较，以修剪非常大的集成分类器以进行恶意软件检测。实验结果表明，与 EPIC、DHCEP 和 K-Means 剪枝创建的分类器相比，HCP 通过产生更好的集成分类器获得了更好的结果。我们进行了实验，将其性能与通过个人贡献排序 (EPIC)、定向爬山集成剪枝 (DHCEP) 和 K-Means 剪枝方法的性能进行比较，以修剪非常大的集成分类器以进行恶意软件检测。实验结果表明，与 EPIC、DHCEP 和 K-Means 剪枝创建的分类器相比，HCP 通过产生更好的集成分类器获得了更好的结果。

更新日期：2020-04-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11