当前位置:
X-MOL 学术
›
arXiv.cs.CR
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comment on "AndrODet: An adaptive Android obfuscation detector"
arXiv - CS - Cryptography and Security Pub Date : 2019-10-14 , DOI: arxiv-1910.06192 Alireza Mohammadinodooshan, Ulf Karg\'en, Nahid Shahmehri
arXiv - CS - Cryptography and Security Pub Date : 2019-10-14 , DOI: arxiv-1910.06192 Alireza Mohammadinodooshan, Ulf Karg\'en, Nahid Shahmehri
We have identified a methodological problem in the empirical evaluation of
the string encryption detection capabilities of the AndrODet system described
by Mirzaei et al. in the recent paper "AndrODet: An adaptive Android
obfuscation detector". The accuracy of string encryption detection is evaluated
using samples from the AMD and PraGuard malware datasets. However, the authors
failed to account for the fact that many of the AMD samples are highly similar
due to the fact that they come from the same malware family. This introduces a
risk that a machine learning system trained on these samples could fail to
learn a generalizable model for string encryption detection, and might instead
learn to classify samples based on characteristics of each malware family. Our
own evaluation strongly indicates that the reported high accuracy of AndrODet's
string encryption detection is indeed due to this phenomenon. When we evaluated
AndrODet, we found that when we ensured that samples from the same family never
appeared in both training and testing data, the accuracy dropped to around 50%.
Moreover, the PraGuard dataset is not suitable for evaluating a static string
encryption detector such as AndrODet, since the particular obfuscation tool
used to produce the dataset effectively makes it impossible to extract
meaningful features of static strings in Android apps.
中文翻译:
评论“AndrODet:一个自适应的Android混淆检测器”
我们在 Mirzaei 等人描述的 AndrODet 系统的字符串加密检测能力的经验评估中发现了一个方法论问题。在最近的论文“AndrODet:一个自适应的 Android 混淆检测器”中。使用来自 AMD 和 PraGuard 恶意软件数据集的样本评估字符串加密检测的准确性。然而,作者未能解释许多 AMD 样本高度相似的事实,因为它们来自同一个恶意软件家族。这引入了一种风险,即在这些样本上训练的机器学习系统可能无法学习用于字符串加密检测的通用模型,而可能会学习根据每个恶意软件家族的特征对样本进行分类。我们自己的评估强烈表明,AndrODet 的字符串加密检测报告的高精度确实是由于这种现象。当我们评估 AndrODet 时,我们发现当我们确保来自同一家族的样本从未出现在训练和测试数据中时,准确率下降到 50% 左右。此外,PraGuard 数据集不适合评估诸如 AndrODet 之类的静态字符串加密检测器,因为用于有效生成数据集的特定混淆工具使得无法在 Android 应用程序中提取静态字符串的有意义的特征。
更新日期:2020-01-22
中文翻译:
评论“AndrODet:一个自适应的Android混淆检测器”
我们在 Mirzaei 等人描述的 AndrODet 系统的字符串加密检测能力的经验评估中发现了一个方法论问题。在最近的论文“AndrODet:一个自适应的 Android 混淆检测器”中。使用来自 AMD 和 PraGuard 恶意软件数据集的样本评估字符串加密检测的准确性。然而,作者未能解释许多 AMD 样本高度相似的事实,因为它们来自同一个恶意软件家族。这引入了一种风险,即在这些样本上训练的机器学习系统可能无法学习用于字符串加密检测的通用模型,而可能会学习根据每个恶意软件家族的特征对样本进行分类。我们自己的评估强烈表明,AndrODet 的字符串加密检测报告的高精度确实是由于这种现象。当我们评估 AndrODet 时,我们发现当我们确保来自同一家族的样本从未出现在训练和测试数据中时,准确率下降到 50% 左右。此外,PraGuard 数据集不适合评估诸如 AndrODet 之类的静态字符串加密检测器,因为用于有效生成数据集的特定混淆工具使得无法在 Android 应用程序中提取静态字符串的有意义的特征。