当前位置: X-MOL 学术Int. J. Mach. Learn. & Cyber. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online domain description of big data based on hyperellipsoid models
International Journal of Machine Learning and Cybernetics ( IF 3.1 ) Pub Date : 2021-04-13 , DOI: 10.1007/s13042-021-01300-0
Zengshuai Qiu

Big data is usually massive, diverse, time-varying, and high-dimensional. The focus of this paper is on the domain description of big data, which is the basis for solving the above problems. This paper has three main contributions. Firstly, one hyperellipsoid model is proposed to analyze domain description of big data. The parameters of the hyperellipsoid model can be adaptively adjusted according to the proposed objective function without relying on manual parameter selection, which expands the application range of the model. Secondly, an improved FDPC algorithm is proposed to generate multiple hyperellipsoid models to approximate the spatial distribution of big data, thus improving the accuracy of domain description. Multiple hyperellipsoid models can not only greatly eliminate the spatial redundancy of the domain description based on one hyperellipsoid model, but also provide a feasible method for describing complex spatial distribution. Thirdly, an online domain description algorithm based on hyperellipsoid models is proposed, which improves the robustness of hyperellipsoid models on time-varying data. The parallel processing flow of the algorithm is given. In the experiment, synthetic instances and real-world datasets were applied to test the performance of hyperellipsoid models. By comparing LOF, OneClassSVM, SVDD and isolation forest, the performance of the proposed method is competitive and promising.



中文翻译:

基于超椭球模型的大数据在线域描述

大数据通常是海量的,多样化的,时变的和高维的。本文的重点是大数据的域描述,这是解决上述问题的基础。本文有三个主要贡献。首先,提出了一种超椭球模型来分析大数据的域描述。可以根据提出的目标函数自适应地调整超椭球体模型的参数,而无需手动选择参数,从而扩展了模型的应用范围。其次,提出了一种改进的FDPC算法来生成多个超椭球模型来近似大数据的空间分布,从而提高了域描述的准确性。多个超椭球模型不仅可以大大消除基于一个超椭球模型的域描述的空间冗余性,而且为描述复杂的空间分布提供了一种可行的方法。第三,提出了一种基于超椭球模型的在线域描述算法,提高了超椭球模型在时变数据上的鲁棒性。给出了算法的并行处理流程。在实验中,使用合成实例和真实世界的数据集来测试超椭球模型的性能。通过比较LOF,OneClassSVM,SVDD和隔离林,该方法的性能具有竞争性并且很有前途。提出了一种基于超椭球模型的在线域描述算法,提高了超椭球模型在时变数据上的鲁棒性。给出了算法的并行处理流程。在实验中,使用合成实例和真实世界的数据集来测试超椭球模型的性能。通过比较LOF,OneClassSVM,SVDD和隔离林,该方法的性能具有竞争性并且很有前途。提出了一种基于超椭球模型的在线域描述算法,提高了超椭球模型在时变数据上的鲁棒性。给出了算法的并行处理流程。在实验中,使用合成实例和真实世界的数据集来测试超椭球模型的性能。通过比较LOF,OneClassSVM,SVDD和隔离林,该方法的性能具有竞争性并且很有前途。

更新日期:2021-04-13
down
wechat
bug