usfAD : a robust anomaly detector based on unsupervised stochastic forest,International Journal of Machine Learning and Cybernetics

当前位置： X-MOL 学术 › Int. J. Mach. Learn. & Cyber. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

usfAD : a robust anomaly detector based on unsupervised stochastic forest
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2020-11-02 , DOI: 10.1007/s13042-020-01225-0
Sunil Aryal , K.C. Santosh , Richard Dazeley

In real-world applications, data can be represented using different units/scales. For example, weight in kilograms or pounds and fuel-efficiency in km/l or l/100 km. One unit can be a linear or non-linear scaling of another. The variation in metrics due to the non-linear scaling makes Anomaly Detection (AD) challenging. Most existing AD algorithms rely on distance- or density-based functions, which makes them sensitive to how data is expressed. This means that they are representation dependent. To avoid such a problem, we introduce a new anomaly detection method, which we call ‘usfAD: Unsupervised Stochastic Forest-based Anomaly Detector’. Our empirical evaluation in synthetic and real-world cybersecurity (spam detection, malicious URL detection and intrusion detection) datasets shows that our approach is more robust to the variation in units/scales used to express data. It produces more consistent and better results than five state-of-the-art AD methods namely: local outlier factor; one-class support vector machine; isolation forest; nearest neighbor in a random subsample of data; and, simple histogram-based probabilistic method.

中文翻译：

usfAD：基于无监督随机森林的鲁棒异常检测器

在实际应用中，可以使用不同的单位/比例来表示数据。例如，重量以千克或磅为单位，燃油效率以km / l或l / 100 km为单位。一个单位可以是另一单位的线性或非线性缩放。由于非线性缩放而导致的度量标准变化使异常检测（AD）具有挑战性。大多数现有的AD算法都依赖于基于距离或密度的函数，这使它们对数据表示方式非常敏感。这意味着它们取决于表示形式。为避免此类问题，我们引入了一种新的异常检测方法，我们将其称为“ usfAD：无监督的基于随机森林的异常检测器”。我们对综合和现实网络安全（垃圾邮件检测，恶意URL检测和入侵检测）数据集表明，我们的方法对于用于表示数据的单位/标度的变化更加健壮。它比五种最新的AD方法产生更一致，更好的结果：一类支持向量机；隔离林数据随机子样本中的最近邻居；以及基于直方图的简单概率方法。

更新日期：2020-11-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11