Skip to main content
Log in

Online domain description of big data based on hyperellipsoid models

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Big data is usually massive, diverse, time-varying, and high-dimensional. The focus of this paper is on the domain description of big data, which is the basis for solving the above problems. This paper has three main contributions. Firstly, one hyperellipsoid model is proposed to analyze domain description of big data. The parameters of the hyperellipsoid model can be adaptively adjusted according to the proposed objective function without relying on manual parameter selection, which expands the application range of the model. Secondly, an improved FDPC algorithm is proposed to generate multiple hyperellipsoid models to approximate the spatial distribution of big data, thus improving the accuracy of domain description. Multiple hyperellipsoid models can not only greatly eliminate the spatial redundancy of the domain description based on one hyperellipsoid model, but also provide a feasible method for describing complex spatial distribution. Thirdly, an online domain description algorithm based on hyperellipsoid models is proposed, which improves the robustness of hyperellipsoid models on time-varying data. The parallel processing flow of the algorithm is given. In the experiment, synthetic instances and real-world datasets were applied to test the performance of hyperellipsoid models. By comparing LOF, OneClassSVM, SVDD and isolation forest, the performance of the proposed method is competitive and promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. XX Wu JP Zhang FY Wang 2020 Stability-based generalization analysis of distributed learning algorithmsfor big data IEEE Trans Neural Netw Learn Syst https://doi.org/10.1109/TNNLS.2019.2910188

    Article  Google Scholar 

  2. Liu XY, Zhu Q, Pramanik S, Brown CT, Qian G (2020) VA-store: a virtual approximate store approach to supporting repetitive big data in genome sequence analyses. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2018.2885952

  3. M Radovanović A Nanopoulos M Ivanović 2015 Reverse nearest neighbors in unsupervised distance-based outlier detection IEEE Trans Knowl Data Eng 27 5 1369 1382

    Article  Google Scholar 

  4. Liu H, Li X, Li J, Zhang S (2018) Efficient outlier detection for high-dimensional data. IEEETrans Syst Man Cybern Syst 48(12): 2451–2461

  5. P Oza VM Patel 2019 One-class convolutional neural network IEEE Signal Process Lett Syst 26 2 277 281

    Article  Google Scholar 

  6. B Liu Y Xiao PS Yu Z Hao L Cao 2014 An efficient approach for outlier detection with imperfect data labels IEEE Trans Knowl Data Eng 26 7 1602 1616

    Article  Google Scholar 

  7. S Decherchi W Rocchia 2017 Import vector domain description: a kernel logistic one-class learning algorithm IEEE Trans Neural Netw Learn Syst 28 7 1722 1729

    Article  MathSciNet  Google Scholar 

  8. F Angiulli S Basta S Lodi C Sartori 2016 GPU Strategies for distance-based outlier detection IEEE Trans Parallel Distrib Syst 27 11 3256 3268

    Article  Google Scholar 

  9. K Gokcesu MM Neyshabouri H Gokcesu SS Kozat 2019 Sequential outlier detection based on incremental decision trees IEEE Trans Signal Process 67 4 993 1005

    Article  MathSciNet  Google Scholar 

  10. Y Cong J Liu B Fan P Zeng H Yu J Luo 2018 Online similarity learning for big data with overfitting IEEE Trans Big Data 4 1 78 89

    Article  Google Scholar 

  11. A Rodriguez A Laio 2014 Clustering by fast search and find of density peaks Science https://doi.org/10.1126/science.1242072

    Article  Google Scholar 

  12. FE Curtis 2012 A penalty-interior-point algorithm for nonlinear constrained optimization IEEE Trans Pattern Anal Mach Intell 4 2 181 209

    MathSciNet  MATH  Google Scholar 

  13. Y Altmann N Dobigeon JY Tourneret 2014 Unsupervised post-nonlinear unmixing of hyperspectral image using a hamiltonian monte Carlo algorithm IEEE Trans Image Process 23 6 2663 2675

    Article  MathSciNet  Google Scholar 

  14. RosenbergA, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. In: Proc. EMNLP-CoNLL, Prague, Czech Republic, pp. 410–420.

  15. YC Xiao HG Wang WL Xu 2015 Parameter selection of gaussian kernel for one-class SVM IEEE Trans Cybern 45 5 941 953

    Article  Google Scholar 

  16. W Zhang 2015 Support vector data description using privileged information Electron Lett 51 14 1075 1076

    Article  Google Scholar 

  17. S Ahmed Y Lee S Hyun I Koo 2019 Unsupervised machine learning-based detection of covert data integrity assault in smart grid networks utilizing isolation forest IEEE Trans Inf Foren Secur 14 10 2765 2777

    Article  Google Scholar 

  18. JB Shen XP Hao ZY Liang Y Liu WG Wang L Shao 2016 Real-time superpixel segmentation by DBSCAN clustering algorithm IEEE Trans Image Process 25 12 5933 5942

    Article  MathSciNet  Google Scholar 

  19. V D'Orangeville MA Mayers ME Monga MS Wang 2013 Efficient cluster labeling for support vector clustering IEEE Trans Knowl Data Eng 25 11 2494 2506

    Article  Google Scholar 

  20. PA Forero V Kekatos GB Giannakis 2012 Robust clustering using outlier-sparsity regularization IEEE Trans Signal Process 60 8 4163 4177

    Article  MathSciNet  Google Scholar 

  21. Ramesh D, Kumari K (2021) DEBC-GM: denclue based gaussian mixture approach for big data clustering. In: Proc. IEEE International Conference on Current Trends toward Converging Technologies, Coimbatore, India. https://doi.org/10.1109/ICCTCT.2018.8550895

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zengshuai Qiu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiu, Z. Online domain description of big data based on hyperellipsoid models. Int. J. Mach. Learn. & Cyber. 12, 2185–2197 (2021). https://doi.org/10.1007/s13042-021-01300-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01300-0

Keywords

Navigation