当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Randomized nonlinear one-class support vector machines with bounded loss function to detect of outliers for large scale IoT data
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2020-06-19 , DOI: 10.1016/j.future.2020.05.045
Imran Razzak , Khurram Zafar , Muhammad Imran , Guandong Xu

Exponential growth of large scale data industrial internet of things is evident due to the enormous deployment of IoT data acquisition devices. Detection of unusual patterns from large scale IoT data is important though challenging task. Recently, one-class support vector machines is extensively being used for anomaly detection. It tries to find an optimal hyperplane in high dimensional data that best separates the data from anomalies with maximum margin. However, the hinge loss of traditional one-class support vector machines is unbounded, which results in larger loss caused by outliers affecting its performance for anomaly detection. Furthermore, existing methods are computationally complex for larger data. In this paper, we present novel anomaly detection for large scale data by using randomized nonlinear features in support vector machines with bounded loss function rather than finding optimized support vectors with unbounded loss function. Extensive experimental evaluation on ten benchmark datasets shows the robustness of the proposed approach against outliers such as 0.8239, 0.7921 , 0.7501, 0.6711 , 0.6692, 0.4789 , 0.6462 , 0.6812 , 0.7271 and 0.7873 accuracy for Gas Sensor Array, Human Activity Recognition, Parkinson’s, Hepatitis, Breast Cancer, Blood Transfusion , Heart, ILPD and Wholesale Customers datasets respectively. In addition to this, introduction of randomized nonlinear feature helps to considerably decrease the computational complexity and space complexity from O(N3) to O(Bkn) and O(N2) to O(Bkn). Thus, very attractive for larger datasets.



中文翻译:

具有有限损失功能的随机非线性一类支持向量机,可检测大规模物联网数据的异常值

大规模数据工业物联网的指数增长非常明显,这是因为IoT数据采集设备的大量部署。尽管具有挑战性,但从大规模物联网数据中检测异常模式非常重要。近来,一类支持向量机被广泛用于异常检测。它试图在高维数据中找到最佳的超平面,从而以最大的余量将数据与异常最佳地分离开。但是,传统的一类支持向量机的铰链损失是无限的,这导致异常值影响其异常检测性能的异常值导致更大的损失。此外,对于较大的数据,现有方法的计算复杂。在本文中,我们通过使用具有有限损失函数的支持向量机中的随机非线性特征,而不是寻找具有无限制损失函数的优化支持向量,来提出针对大规模数据的新颖异常检测。对十个基准数据集的广泛实验评估表明,针对气体传感器阵列,人类活动识别,帕金森氏病,肝炎的准确性,所提出的方法针对异常值的稳健性,例如异常值0.8239、0.7921、0.7501、0.6711、0.6692、0.4789、0.6462、0.6812、0.7271和0.7873 ,乳腺癌,输血,心脏,ILPD和批发客户数据集。除此之外,引入随机非线性特征有助于从以下方面显着降低计算复杂度和空间复杂度:对十个基准数据集的广泛实验评估表明,针对气体传感器阵列,人类活动识别,帕金森氏病,肝炎的准确性,所提出的方法针对异常值的稳健性,例如异常值0.8239、0.7921、0.7501、0.6711、0.6692、0.4789、0.6462、0.6812、0.7271和0.7873 ,乳腺癌,输血,心脏,ILPD和批发客户数据集。除此之外,引入随机非线性特征有助于从以下方面显着降低计算复杂度和空间复杂度:对十个基准数据集的广泛实验评估表明,针对气体传感器阵列,人类活动识别,帕金森氏病,肝炎的准确性,所提出的方法针对异常值的稳健性,例如异常值0.8239、0.7921、0.7501、0.6711、0.6692、0.4789、0.6462、0.6812、0.7271和0.7873 ,乳腺癌,输血,心脏,ILPD和批发客户数据集。除此之外,引入随机非线性特征有助于从以下方面显着降低计算复杂度和空间复杂度:ILPD和批发客户数据集。除此之外,引入随机非线性特征有助于从以下方面显着降低计算复杂度和空间复杂度:ILPD和批发客户数据集。除此之外,引入随机非线性特征有助于从以下方面显着降低计算复杂度和空间复杂度:Øñ3ØķñØñ2Øķñ。因此,对于较大的数据集非常有吸引力。

更新日期:2020-06-19
down
wechat
bug