当前位置: X-MOL 学术Inf. Softw. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A statistical pattern based feature extraction method on system call traces for anomaly detection
Information and Software Technology ( IF 3.9 ) Pub Date : 2020-05-25 , DOI: 10.1016/j.infsof.2020.106348
Zhen Liu , Nathalie Japkowicz , Ruoyu Wang , Yongming Cai , Deyu Tang , Xianfa Cai

Context

In host-based anomaly detection, feature extraction on the system call traces is important to build an effective anomaly detection model. Different kinds of feature extraction methods are recently proposed and most of them aim at preserving the positional information of the system calls within a trace. These extracted features are generally named from system calls, therefore, cannot be used directly in the case of cross platform applications. In addition, some of these feature extraction methods are very costly to implement.

Objective

This paper presents a new feature extraction method. It aims at extracting features that are irrelevant to the names of system calls. The samples represented by the extracted features can be directly used in the case of cross platform applications. In addition, this method is lightweight in that the feature values are not expensive to compute.

Method

The proposed method firstly transforms the system calls in a trace into frequency sequences of n-grams and then explores a fixed number of statistical features on the frequency sequences. The extracted features are irrelevant to the names/indexes of system calls on a platform. The calculation of feature values works on the frequency sequences rather than on system call sequences. These feature vectors built on the training set with only normal data are then used to train a one class classification model for anomaly detection.

Results

We compared our method with four previously proposed feature extraction methods on system call traces. When used on the same platform, even though our method does not always obtain the highest AUC, overall, it performs better than all the compared methods. When testing on cross platform, it performs the best among all compared methods.

Conclusion

The features extracted by our method are platform-independent and are suitable for anomaly detection across platforms.



中文翻译:

基于统计模式的系统调用轨迹特征提取方法,用于异常检测

语境

在基于主机的异常检测中,对系统调用跟踪进行特征提取对于建立有效的异常检测模型很重要。最近提出了不同种类的特征提取方法,并且它们中的大多数旨在保留轨迹内系统调用的位置信息。这些提取的功能通常是从系统调用中命名的,因此,在跨平台应用程序中不能直接使用。此外,其中一些特征提取方法的实施成本很高。

目的

本文提出了一种新的特征提取方法。它旨在提取与系统调用名称无关的功能。在跨平台应用程序的情况下,可以直接使用由提取的特征表示的样本。另外,该方法重量轻,因为特征值的计算成本不高。

方法

所提出的方法首先将轨迹中的系统调用转换为n-gram的频率序列,然后在频率序列上探索固定数量的统计特征。提取的功能与平台上系统调用的名称/索引无关。特征值的计算在频率序列上进行,而不是在系统调用序列上进行。然后,将这些仅基于正常数据的训练集上的特征向量用于训练一类分类模型以进行异常检测。

结果

我们将我们的方法与系统调用跟踪上的四种先前提出的特征提取方法进行了比较。当在同一平台上使用时,即使我们的方法并不总是获得最高的AUC,总的来说,它比所有比较的方法都有更好的表现。在跨平台上进行测试时,它在所有比较方法中均表现最佳。

结论

通过我们的方法提取的特征与平台无关,适用于跨平台的异常检测。

更新日期:2020-05-25
down
wechat
bug