Skip to main content
Log in

An efficient sampling-based visualization technique for big data clustering with crisp partitions

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The data cluster tendency is an emerging need for exploring the big data cluster analysis tasks. The data are evaluated based on the number of clusters is known as cluster tendency. Many visualization techniques have been developed for the detection of cluster tendency. Some of the existing techniques include Visual Assessment Tendency (VAT), spectral-based VAT (SpecVAT), and improved VAT (iVAT), are considerably succeeded for an assessment of cluster tendency for small datasets. A bigVAT is another method that was recently developed for the estimation of cluster tendency of big data. It is perfect for deriving the clustering tendency in visual form for big data. However, it is intractable to explore the data clusters for large volumes of data objects. The proposed work addresses the clustering problem of bigVAT with the derivation of sampling-based crisp partitions. The crisp partitions will accurately predict the cluster labels of data objects. This research is based on big synthetic and big real-life datasets for demonstrating the performance efficiency of the proposed work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Bezdek, J.C., Hathaway, R.J.: VAT: a tool for visual assessment of (cluster) tendency. In: Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02, pp. 2225–2230 (2002)

  2. Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y.: A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE 10(12), e0144059 (2015)

    Article  Google Scholar 

  3. Singh, S., Singh, N.: Big Data analytics. In: 2012 International Conference on Communication, Information & Computing Technology (ICCICT), Mumbai, 2012, pp. 1–4, https://doi.org/10.1109/ICCICT.2012.6398180.

  4. Suleman Basha, M., Mouleeswaran, S.K., Rajendra Prasad, K.: Cluster tendency methods for visualizing the data partitions. International Journal of Innovative Technology & Exploring Engineering, 2019

  5. Esteves, R.M., Hacker, T., Rong, C.: Competitive K-means, a new accurate and distributed K-means algorithm for large datasets. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science, Bristol, 2013, pp. 17–24. https://doi.org/10.1109/CloudCom.2013.89.

  6. Kumar, D., Bezdek, J.C., Palaniswami, M., Rajasegarar, S., Leckie, C., Havens, T.C.: A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10), 2372–2385 (2016)

    Article  Google Scholar 

  7. Rajendra Prasad, K., Mohammed, M. & Noorullah, R.M. Visual topic models for healthcare data clustering. Evol. Intel. (2019). https://doi.org/https://doi.org/10.1007/s12065-019-00300-y

  8. Taghva, K., Veni, R.: Effects of similarity metrics on document clustering. In: 2010 Seventh International Conference on Information Technology: New Generations, Las Vegas, NV, 2010, pp. 222–226, https://doi.org/10.1109/ITNG.2010.65.

  9. Leonori, S., Martino, A., Mascioli, F.M.F., Rizzi, A.: ANFIS microgrid energy management system synthesis by hyperplane clustering supported by neurofuzzy min–max classifier. IEEE Trans. Emerg. Top. Comput. Intell. 3(3), 193–204 (2019)

    Article  Google Scholar 

  10. Rajendra Prasad, K., Mohammed, M., Noorullah, : Hybrid topic cluster models for social Healthcare Data. Int. J. Adv. Comput. Sci. Appl. 10(11), 490–506 (2019)

    Google Scholar 

  11. Rathore, P., Kumar, D., Bezdek, J.C., Rajasegarar, S., Palaniswami, M.: A rapid hybrid clustering algorithm for large volumes of high dimensional data. IEEE Trans Knowledge Data Eng 31(4), 641–654 (2019). https://doi.org/10.1109/TKDE.2018.2842191

    Article  Google Scholar 

  12. Havens, T.C., Bezdek, J.C.: An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24(5), 813–822 (2012). https://doi.org/10.1109/TKDE.2011.33

    Article  Google Scholar 

  13. Bezdek, J.L.: SpecVAT: Enhanced visual cluster analysis. In: IEEE International Conference on Data Mining, ICDM (2008)

  14. Denton, P., Parke, S., Tao, T., Zhang, X.: Eigenvectors from eigenvalues. arXiv. 1908, 03795 (2019)

    Google Scholar 

  15. Huband, J.M., Bezdek, J.C., Hathaway, R.J.: bigVAT: Visual assessment of cluster tendency for large data set. Pattern Recogn. 38(11), 1875–1886 (2005)

    Article  Google Scholar 

  16. Bhatnagar, V., Majhi, R., Jena, P.R.: Comparative performance evaluation of clustering algorithms for grouping manufacturing firms. Arab J Sci Eng 43, 4071–4083 (2018)

    Article  Google Scholar 

  17. Eswara Reddy, B., Rajendra Prasad, K.: Reducing runtime values in minimum spanning tree based clustering by visual access tendency. Int. J. Data Min. Knowl. Manag. Process 2(3), 11–22 (2012)

  18. Lin, Y.S., Jiang, J.Y., Lee, S.J.: A similarity measure for text classification and clustering. IEEE Trans. Knowl. Data Eng. 26(7), 1575–1590 (2013)

    Article  Google Scholar 

  19. Chow, T.W.S., Huang, D.: Data reduction for pattern recognition and data analysis. In: Fulcher, J., Jain, L.C. (eds) Computational Intelligence: A Compendium. Studies in Computational Intelligence, vol 115. Springer, Berlin (2008)

  20. Shengxi, P., Jianguo, L., Jiaxiong, P., Wang, G.: The design and implementation of dip arrow plot pattern recognition system. In: [1988 Proceedings] 9th International Conference on Pattern Recognition, vol. 2, Rome, Italy, pp. 703–705. (1988). https://doi.org/10.1109/ICPR.1988.28333.

  21. Tariq, A., Foroosh, H.: T-clustering: Image clustering by tensor decomposition. In: 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, 2015, pp. 4803–4807. https://doi.org/10.1109/ICIP.2015.7351719.

  22. Ji, Y., Wang, L., Wu, W., Shao, H., Feng, Y.: A method for LSTM-based trajectory modeling and abnormal trajectory detection. IEEE Access 8, 104063–104073 (2020). https://doi.org/10.1109/ACCESS.2020.2997967

    Article  Google Scholar 

  23. Rajendra Prasad, K., Suleman Basha, M.: Improving the performance of speech clustering method. In: IEEE- 10th International Conference on Intelligent Systems and Control (ISCO) (2016)

  24. Mahallati, S., Bezdek, J.C., Kumar, D., Popovic, M.R., Valiante, T.A.: Interpreting cluster structure in waveform data with visual assessment and Dunn’s index. InFrontiers in Computational Intelligence 2018 (pp. 73–101). Springer, Cham

  25. Rajendra Prasad, K., Suleman Basha, M., Rama Subbaia, B.: Speech clustering analysis by multi viewpoints cosine based similarity. Int. J. Pure Appl. Math. 116(21), 235–241 (2017)

    Google Scholar 

  26. https://archive.ics.uci.edu/ml/index.php

  27. https://archive.ics.uci.edu/ml/support/Pen-Based+Recognition+of+Handwritten+Digits

  28. Pattanodom, M., I am-On, N., Boongoen, T.: Clustering data with the presence of missing values by ensemble approach. In: 2016 Second Asian Conference on Defense Technology (ACDT). https://doi.org/10.1109/acdt.2016.7437660

  29. Alessia, A., Pizzuti, C.: Is normalized mutual information a fair measure for comparing community detection methods? In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2015).

  30. Xu, G., Meng, Y., Chen, Z., Qiu, X., Wang, C., Yao, H.: Research on topic detection and tracking for online news texts. IEEE Access 7, 58407–58418 (2019)

    Article  Google Scholar 

  31. Gulnashin F., Sharma I., Sharma H. (2019) A new deterministic method of initializing spherical K-means for document clustering. In: Pati B., Panigrahi C., Misra S., Pujari A., Bakshi S. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 713. Springer, New York

  32. Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data. Sci. 2, 165–193 (2015)

    Article  Google Scholar 

  33. Hitendra Sarma, T., Viswanath, P., Eswara Reddy, B.: Single pass k-means clustering method. Sadhana, Vol. 38, Part. 3, 407–419, Springer (2013)

Download references

Funding

This study was funded by Science and Engineering Research Board (Grant No. ECR/2016/001556).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Rajendra Prasad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajendra Prasad, K., Mohammed, M., Narasimha Prasad, L.V. et al. An efficient sampling-based visualization technique for big data clustering with crisp partitions. Distrib Parallel Databases 39, 813–832 (2021). https://doi.org/10.1007/s10619-021-07324-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-021-07324-3

Keywords

Navigation