当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
usfAD : a robust anomaly detector based on unsupervised stochastic forest
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2020-11-02 , DOI: 10.1007/s13042-020-01225-0
Sunil Aryal , K.C. Santosh , Richard Dazeley

In real-world applications, data can be represented using different units/scales. For example, weight in kilograms or pounds and fuel-efficiency in km/l or l/100 km. One unit can be a linear or non-linear scaling of another. The variation in metrics due to the non-linear scaling makes Anomaly Detection (AD) challenging. Most existing AD algorithms rely on distance- or density-based functions, which makes them sensitive to how data is expressed. This means that they are representation dependent. To avoid such a problem, we introduce a new anomaly detection method, which we call ‘usfAD: Unsupervised Stochastic Forest-based Anomaly Detector’. Our empirical evaluation in synthetic and real-world cybersecurity (spam detection, malicious URL detection and intrusion detection) datasets shows that our approach is more robust to the variation in units/scales used to express data. It produces more consistent and better results than five state-of-the-art AD methods namely: local outlier factor; one-class support vector machine; isolation forest; nearest neighbor in a random subsample of data; and, simple histogram-based probabilistic method.



中文翻译:

usfAD:基于无监督随机森林的鲁棒异常检测器

在实际应用中,可以使用不同的单位/比例来表示数据。例如,重量以千克或磅为单位,燃油效率以km / l或l / 100 km为单位。一个单位可以是另一单位的线性或非线性缩放。由于非线性缩放而导致的度量标准变化使异常检测(AD)具有挑战性。大多数现有的AD算法都依赖于基于距离或密度的函数,这使它们对数据表示方式非常敏感。这意味着它们取决于表示形式。为避免此类问题,我们引入了一种新的异常检测方法,我们将其称为“ usfAD:无监督的基于随机森林的异常检测器”。我们对综合和现实网络安全(垃圾邮件检测,恶意URL检测和入侵检测)数据集表明,我们的方法对于用于表示数据的单位/标度的变化更加健壮。它比五种最新的AD方法产生更一致,更好的结果:一类支持向量机;隔离林 数据随机子样本中的最近邻居;以及基于直方图的简单概率方法。

更新日期:2020-11-02
down
wechat
bug