An efficient sampling-based visualization technique for big data clustering with crisp partitions

Rajendra Prasad, K.; Mohammed, Moulana; Narasimha Prasad, L. V.; Anguraj, Dinesh Kumar

doi:10.1007/s10619-021-07324-3

An efficient sampling-based visualization technique for big data clustering with crisp partitions

Published: 19 February 2021

Volume 39, pages 813–832, (2021)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

K. Rajendra Prasad ORCID: orcid.org/0000-0002-8366-4149¹,
Moulana Mohammed²,
L. V. Narasimha Prasad³ &
…
Dinesh Kumar Anguraj²

268 Accesses
7 Citations
Explore all metrics

Abstract

The data cluster tendency is an emerging need for exploring the big data cluster analysis tasks. The data are evaluated based on the number of clusters is known as cluster tendency. Many visualization techniques have been developed for the detection of cluster tendency. Some of the existing techniques include Visual Assessment Tendency (VAT), spectral-based VAT (SpecVAT), and improved VAT (iVAT), are considerably succeeded for an assessment of cluster tendency for small datasets. A bigVAT is another method that was recently developed for the estimation of cluster tendency of big data. It is perfect for deriving the clustering tendency in visual form for big data. However, it is intractable to explore the data clusters for large volumes of data objects. The proposed work addresses the clustering problem of bigVAT with the derivation of sampling-based crisp partitions. The crisp partitions will accurately predict the cluster labels of data objects. This research is based on big synthetic and big real-life datasets for demonstrating the performance efficiency of the proposed work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Sample-Based Algorithm for Visual Assessment of Cluster Tendency (VAT) with Large Datasets

An enhanced visual approach for accessing the clustering tendency of big data

Article 15 March 2021

Fast Cluster Tendency Assessment for Big, High-Dimensional Data

References

Bezdek, J.C., Hathaway, R.J.: VAT: a tool for visual assessment of (cluster) tendency. In: Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02, pp. 2225–2230 (2002)
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y.: A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE 10(12), e0144059 (2015)
Article Google Scholar
Singh, S., Singh, N.: Big Data analytics. In: 2012 International Conference on Communication, Information & Computing Technology (ICCICT), Mumbai, 2012, pp. 1–4, https://doi.org/10.1109/ICCICT.2012.6398180.
Suleman Basha, M., Mouleeswaran, S.K., Rajendra Prasad, K.: Cluster tendency methods for visualizing the data partitions. International Journal of Innovative Technology & Exploring Engineering, 2019
Esteves, R.M., Hacker, T., Rong, C.: Competitive K-means, a new accurate and distributed K-means algorithm for large datasets. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science, Bristol, 2013, pp. 17–24. https://doi.org/10.1109/CloudCom.2013.89.
Kumar, D., Bezdek, J.C., Palaniswami, M., Rajasegarar, S., Leckie, C., Havens, T.C.: A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10), 2372–2385 (2016)
Article Google Scholar
Rajendra Prasad, K., Mohammed, M. & Noorullah, R.M. Visual topic models for healthcare data clustering. Evol. Intel. (2019). https://doi.org/https://doi.org/10.1007/s12065-019-00300-y
Taghva, K., Veni, R.: Effects of similarity metrics on document clustering. In: 2010 Seventh International Conference on Information Technology: New Generations, Las Vegas, NV, 2010, pp. 222–226, https://doi.org/10.1109/ITNG.2010.65.
Leonori, S., Martino, A., Mascioli, F.M.F., Rizzi, A.: ANFIS microgrid energy management system synthesis by hyperplane clustering supported by neurofuzzy min–max classifier. IEEE Trans. Emerg. Top. Comput. Intell. 3(3), 193–204 (2019)
Article Google Scholar
Rajendra Prasad, K., Mohammed, M., Noorullah, : Hybrid topic cluster models for social Healthcare Data. Int. J. Adv. Comput. Sci. Appl. 10(11), 490–506 (2019)
Google Scholar
Rathore, P., Kumar, D., Bezdek, J.C., Rajasegarar, S., Palaniswami, M.: A rapid hybrid clustering algorithm for large volumes of high dimensional data. IEEE Trans Knowledge Data Eng 31(4), 641–654 (2019). https://doi.org/10.1109/TKDE.2018.2842191
Article Google Scholar
Havens, T.C., Bezdek, J.C.: An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24(5), 813–822 (2012). https://doi.org/10.1109/TKDE.2011.33
Article Google Scholar
Bezdek, J.L.: SpecVAT: Enhanced visual cluster analysis. In: IEEE International Conference on Data Mining, ICDM (2008)
Denton, P., Parke, S., Tao, T., Zhang, X.: Eigenvectors from eigenvalues. arXiv. 1908, 03795 (2019)
Google Scholar
Huband, J.M., Bezdek, J.C., Hathaway, R.J.: bigVAT: Visual assessment of cluster tendency for large data set. Pattern Recogn. 38(11), 1875–1886 (2005)
Article Google Scholar
Bhatnagar, V., Majhi, R., Jena, P.R.: Comparative performance evaluation of clustering algorithms for grouping manufacturing firms. Arab J Sci Eng 43, 4071–4083 (2018)
Article Google Scholar
Eswara Reddy, B., Rajendra Prasad, K.: Reducing runtime values in minimum spanning tree based clustering by visual access tendency. Int. J. Data Min. Knowl. Manag. Process 2(3), 11–22 (2012)
Lin, Y.S., Jiang, J.Y., Lee, S.J.: A similarity measure for text classification and clustering. IEEE Trans. Knowl. Data Eng. 26(7), 1575–1590 (2013)
Article Google Scholar
Chow, T.W.S., Huang, D.: Data reduction for pattern recognition and data analysis. In: Fulcher, J., Jain, L.C. (eds) Computational Intelligence: A Compendium. Studies in Computational Intelligence, vol 115. Springer, Berlin (2008)
Shengxi, P., Jianguo, L., Jiaxiong, P., Wang, G.: The design and implementation of dip arrow plot pattern recognition system. In: [1988 Proceedings] 9th International Conference on Pattern Recognition, vol. 2, Rome, Italy, pp. 703–705. (1988). https://doi.org/10.1109/ICPR.1988.28333.
Tariq, A., Foroosh, H.: T-clustering: Image clustering by tensor decomposition. In: 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, 2015, pp. 4803–4807. https://doi.org/10.1109/ICIP.2015.7351719.
Ji, Y., Wang, L., Wu, W., Shao, H., Feng, Y.: A method for LSTM-based trajectory modeling and abnormal trajectory detection. IEEE Access 8, 104063–104073 (2020). https://doi.org/10.1109/ACCESS.2020.2997967
Article Google Scholar
Rajendra Prasad, K., Suleman Basha, M.: Improving the performance of speech clustering method. In: IEEE- 10th International Conference on Intelligent Systems and Control (ISCO) (2016)
Mahallati, S., Bezdek, J.C., Kumar, D., Popovic, M.R., Valiante, T.A.: Interpreting cluster structure in waveform data with visual assessment and Dunn’s index. InFrontiers in Computational Intelligence 2018 (pp. 73–101). Springer, Cham
Rajendra Prasad, K., Suleman Basha, M., Rama Subbaia, B.: Speech clustering analysis by multi viewpoints cosine based similarity. Int. J. Pure Appl. Math. 116(21), 235–241 (2017)
Google Scholar
https://archive.ics.uci.edu/ml/index.php
https://archive.ics.uci.edu/ml/support/Pen-Based+Recognition+of+Handwritten+Digits
Pattanodom, M., I am-On, N., Boongoen, T.: Clustering data with the presence of missing values by ensemble approach. In: 2016 Second Asian Conference on Defense Technology (ACDT). https://doi.org/10.1109/acdt.2016.7437660
Alessia, A., Pizzuti, C.: Is normalized mutual information a fair measure for comparing community detection methods? In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (2015).
Xu, G., Meng, Y., Chen, Z., Qiu, X., Wang, C., Yao, H.: Research on topic detection and tracking for online news texts. IEEE Access 7, 58407–58418 (2019)
Article Google Scholar
Gulnashin F., Sharma I., Sharma H. (2019) A new deterministic method of initializing spherical K-means for document clustering. In: Pati B., Panigrahi C., Misra S., Pujari A., Bakshi S. (eds) Progress in Advanced Computing and Intelligent Engineering. Advances in Intelligent Systems and Computing, vol 713. Springer, New York
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data. Sci. 2, 165–193 (2015)
Article Google Scholar
Hitendra Sarma, T., Viswanath, P., Eswara Reddy, B.: Single pass k-means clustering method. Sadhana, Vol. 38, Part. 3, 407–419, Springer (2013)

Download references

Funding

This study was funded by Science and Engineering Research Board (Grant No. ECR/2016/001556).

Author information

Authors and Affiliations

Department of CSE, Rajeev Gandhi Memorial College of Engineering & Technology, Nandyal, Andhra Pradesh, India
K. Rajendra Prasad
Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India
Moulana Mohammed & Dinesh Kumar Anguraj
Department of CSE, Institute of Aeronautical Engineering, Hyderabad, Telengana, India
L. V. Narasimha Prasad

Authors

K. Rajendra Prasad
View author publications
You can also search for this author in PubMed Google Scholar
Moulana Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
L. V. Narasimha Prasad
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Kumar Anguraj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Rajendra Prasad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rajendra Prasad, K., Mohammed, M., Narasimha Prasad, L.V. et al. An efficient sampling-based visualization technique for big data clustering with crisp partitions. Distrib Parallel Databases 39, 813–832 (2021). https://doi.org/10.1007/s10619-021-07324-3

Download citation

Accepted: 29 January 2021
Published: 19 February 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10619-021-07324-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient sampling-based visualization technique for big data clustering with crisp partitions

Abstract

Access this article

Similar content being viewed by others

A Sample-Based Algorithm for Visual Assessment of Cluster Tendency (VAT) with Large Datasets

An enhanced visual approach for accessing the clustering tendency of big data

Fast Cluster Tendency Assessment for Big, High-Dimensional Data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient sampling-based visualization technique for big data clustering with crisp partitions

Abstract

Access this article

Similar content being viewed by others

A Sample-Based Algorithm for Visual Assessment of Cluster Tendency (VAT) with Large Datasets

An enhanced visual approach for accessing the clustering tendency of big data

Fast Cluster Tendency Assessment for Big, High-Dimensional Data

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation