当前位置: X-MOL 学术Comput. Secur. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
KronoDroid: Time-based hybrid-featured dataset for effective android malware detection and characterization
Computers & Security ( IF 5.6 ) Pub Date : 2021-07-09 , DOI: 10.1016/j.cose.2021.102399
Alejandro Guerra-Manzanares 1 , Hayretdin Bahsi 1 , Sven Nõmm 1
Affiliation  

Android malware evolution has been neglected by the available data sets, thus providing a static snapshot of a non-stationary phenomenon. The impact of the time variable has not had the deserved attention by the Android malware research, omitting its degenerative impact on the performance of machine learning- based classifiers (i.e., concept drift). Besides, the sources of dynamic data and their particularities have been overlooked (i.e., real devices and emulators). Critical factors to take into account when aiming to build more effective, robust, and long-lasting Android malware detection systems. In this research, different sources of benign and malware data are merged, generating a data set encompassing a larger time frame and 489 static and dynamic features are collected. The particularities of the source of the dynamic features (i.e., system calls) are attended using an emulator and a real device, thus generating two equally featured sub-datasets. The main outcome of this research is a novel, labeled, and hybrid-featured Android dataset that provides timestamps for each data sample, covering all years of Android history, from 2008-2020, and considering the distinct dynamic data sources. The emulator data set is composed of 28,745 malicious apps from 209 malware families and 35,246 benign samples. The real device data set contains 41,382 malware, belonging to 240 malware families, and 36,755 benign apps. Made publicly available as KronoDroid, in a structured format, it is the largest hybrid-featured Android dataset and the only one providing timestamped data, considering dynamic sources’ particularities and including samples for over 209 Android malware families.



中文翻译:

KronoDroid:基于时间的混合特征数据集,用于有效的 android 恶意软件检测和表征

可用数据集忽略了 Android 恶意软件的演变,从而提供了非平稳现象的静态快照。时间变量的影响没有得到 Android 恶意软件研究的应有关注,忽略了它对基于机器学习的分类器性能的退化影响(即概念漂移)。此外,动态数据的来源及其特殊性也被忽视了(即真实设备和模拟器)。旨在构建更有效、更强大和更持久的 Android 恶意软件检测系统时需要考虑的关键因素。在这项研究中,合并了不同来源的良性和恶意软件数据,生成了一个包含更大时间范围的数据集,并收集了 489 个静态和动态特征。动态特征来源的特殊性(即,系统调用)使用模拟器和真实设备参与,从而生成两个功能相同的子数据集。这项研究的主要成果是一个新颖的、标记的、混合特征的 Android 数据集,它为每个数据样本提供时间戳,涵盖了 Android 历史的所有年份,从 2008 年到 2020 年,并考虑了不同的动态数据源。模拟器数据集由来自 209 个恶意软件家族的 28,745 个恶意应用程序和 35,246 个良性样本组成。真实设备数据集包含 41,382 个恶意软件,属于 240 个恶意软件家族,以及 36,755 个良性应用程序。作为公开提供 涵盖 Android 历史的所有年份,从 2008 年到 2020 年,并考虑了不同的动态数据源。模拟器数据集由来自 209 个恶意软件家族的 28,745 个恶意应用程序和 35,246 个良性样本组成。真实设备数据集包含 41,382 个恶意软件,属于 240 个恶意软件家族,以及 36,755 个良性应用程序。作为公开提供 涵盖 Android 历史的所有年份,从 2008 年到 2020 年,并考虑了不同的动态数据源。模拟器数据集由来自 209 个恶意软件家族的 28,745 个恶意应用程序和 35,246 个良性样本组成。真实设备数据集包含 41,382 个恶意软件,属于 240 个恶意软件家族,以及 36,755 个良性应用程序。作为公开提供KronoDroid,采用结构化格式,是最大的混合功能 Android 数据集,也是唯一一个提供时间戳数据的数据集,考虑到动态源的特殊性,并包括超过 209 个 Android 恶意软件系列的样本。

更新日期:2021-07-29
down
wechat
bug