Fuzzy kernel K-medoids clustering algorithm for uncertain data objects

Tavakkol, Behnam; Son, Youngdoo

doi:10.1007/s10044-021-00983-z

Fuzzy kernel K-medoids clustering algorithm for uncertain data objects

Theoretical advances
Published: 26 May 2021

Volume 24, pages 1287–1302, (2021)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

392 Accesses
6 Citations
Explore all metrics

Abstract

Most data mining algorithms are designed for traditional type of data objects which are referred to as certain data objects. Certain data objects contain no uncertainty information and are represented by a single point. Capturing uncertainty can result in better performance of algorithms as they might generate more accurate results. There are different ways of modeling uncertainty for data objects, two of the most popular ones are: (1) considering a group of points for each object and (2) considering a probability density function (pdf) for each object. Objects modeled in these ways are referred to as uncertain data objects. Fuzzy clustering is a well-established field of research for certain data. When fuzzy clustering algorithms are used, degrees of membership are generated for assignment of objects to clusters which gives the flexibility to express that objects can belong to more than one cluster. To the best of our knowledge, for uncertain data, there is only one existing fuzzy clustering algorithm in the literature. The existing uncertain fuzzy clustering algorithm, however, cannot properly create non-convex shaped clusters, and therefore, its performance is not that well on uncertain data sets with arbitrary-shaped clusters—clusters that are non-convex, unconventional, and possibly nonlinearly separable. In this paper, we propose a novel fuzzy kernel K-medoids clustering algorithm for uncertain objects which works well on data sets with arbitrary-shaped clusters. We show through several experiments on synthetic and real data that the proposed algorithm outperforms the competitor algorithms: certain fuzzy K-medoids and the uncertain fuzzy K-medoids.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Complex Pythagorean Hesitant Fuzzy Aggregation Operators Based on Aczel-Alsina t-Norm and t-Conorm and Their Applications in Decision-Making

Article 10 April 2024

Zaifu Sun, Zeeshan Ali, … Peide Liu

K-Means algorithm based on multi-feature-induced order

Article 09 April 2024

Benting Wan, Weikang Huang, … Shufen Zhou

Parametric circular intuitionistic fuzzy information measures and multi-criteria decision making with extended TOPSIS

Article 04 April 2024

Mahmut Can Bozyiğit & Mehmet Ünver

References

Aggarwal CC, Philip SY (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21:609–623
Article Google Scholar
Chau M, Cheng R, Kao B, Ng J (2006) Uncertain data mining: An example in clustering location data. In: Ng W-K, Kitsuregawa M, Li J, Chang K (eds) Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 199–204
Chapter Google Scholar
Gullo F, Ponti G, Tagarelli A (2013) Minimizing the variance of cluster mixture models for clustering uncertain objects. Stat Anal Data Min ASA Data Sci J 6:116–135
Article MathSciNet Google Scholar
Gullo F, Ponti G, Tagarelli A, Greco S (2017) An information-theoretic approach to hierarchical clustering of uncertain data. Inf Sci 402:199–215
Article Google Scholar
Gullo F, Ponti G, Tagarelli A (2010) Minimizing the variance of cluster mixture models for clustering uncertain objects. In: Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, pp 839–844
Gullo F, Ponti G, Tagarelli A (2008) Clustering uncertain data via k-medoids. In: Greco S, Lukasiewicz T (eds) Scalable Uncertain Management. Springer, Berlin, pp 229–242
Chapter Google Scholar
Gullo F, Ponti G, Tagarelli A, Greco S (2008) A hierarchical algorithm for clustering uncertain data via an information-theoretic approach. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, pp 821–826
Jiang B, Pei J, Tao Y, Lin X (2013) Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng 25:751–763
Article Google Scholar
Kao B, Lee SD, Lee FK et al (2010) Clustering uncertain data using voronoi diagrams and r-tree index. IEEE Trans Knowl Data Eng 22:1219–1233
Article Google Scholar
Kriegel H-P, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 672–677
Lee SD, Kao B, Cheng R (2007) Reducing UK-means to K-means. In: Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on. IEEE, pp 483–488
Yang B, Zhang Y (2010) Kernel based K-medoids for clustering data with uncertainty. In: Cao L, Feng Y, Zhong J (eds) Advance Data Mining and Applications. Springer, Berlin, pp 246–253
Chapter Google Scholar
Qin B, Xia Y, Li F (2009) DTU: a decision tree for uncertain data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B (eds) Advances in Knowledge Discovery and Data Mining. Springer, Berlin, pp 4–15
Chapter Google Scholar
Tavakkol B, Jeong MK, Albin S (2021) Measures of scatter and fisher discriminant analysis for uncertain data. IEEE Trans Syst Man and Cybern Syst 51(3):1690–1703. https://doi.org/10.1109/TSMC.2019.2902508
Article Google Scholar
Tavakkol B, Jeong MK, Albin SL (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151
Article Google Scholar
Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, pp 483–493
Jiang B, Pei J (2011) Outlier detection on uncertain data: Objects, instances, and inferences. In: 2011 IEEE 27th International Conference on Data Engineering. IEEE, pp 422–433
Liu B, Xiao Y, Cao L et al (2013) SVDD-based outlier detection on uncertain data. Knowl Inf Syst 34:597–618
Article Google Scholar
Liu J, Deng H (2013) Outlier detection on uncertain data based on local information. Knowl-Based Syst 51:60–71
Article Google Scholar
Shaikh SA, Kitagawa H (2014) Top-k outlier detection from uncertain data. Int J Autom Comput 11:128–142
Article Google Scholar
Shaikh SA, Kitagawa H (2012) Distance-based outlier detection on uncertain data of Gaussian distribution. In: Asia-Pacific Web Conference. Springer, pp 109–121
Wang B, Xiao G, Yu H, Yang X (2009) Distance-based outlier detection on uncertain data. In: 2009 Ninth IEEE International Conference on Computer and Information Technology. IEEE, pp 293–298
Zhang H, Wang S, Xu X et al (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 29:5304–5318
Article MathSciNet Google Scholar
Yang M-S (1993) A survey of fuzzy clustering. Math Comput Model 18:1–16
Article MathSciNet Google Scholar
Bora DJ, Gupta D, Kumar A (2014) A comparative study between fuzzy clustering algorithm and hard clustering algorithm. ArXiv Prepr ArXiv14046059
Hamdan H, Govaert G (2005) Mixture model clustering of uncertain data. In: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ’05. IEEE, pp 879–884
Kriegel H-P, Pfeifle M (2005) Hierarchical density-based clustering of uncertain data. In: Fifth IEEE International Conference on Data Mining (ICDM’05). IEEE, pp 4–pp
Wang Y, Dong J, Zhou J, et al (2017) Fuzzy c-medoids method based on JS-divergence for uncertain data clustering. In: 2017 4th International Conference on Information, Cybernetics and Computational Social Systems (ICCSS). IEEE, pp 312–315
Patra BK, Nandi S, Viswanath P (2011) A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recognit 44:2862–2870
Article Google Scholar
Cha S-H (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1:1
Google Scholar
Cui M, Lin Y (2009) Nonlinear numerical analysis in reproducing kernel space. Nova Science Publishers Inc., NewYork
MATH Google Scholar
Fan J, Heckman NE, Wand MP (1995) Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. J Am Stat Assoc 90:141–150
Article MathSciNet Google Scholar
Zhong W-M, He G-L, Pi D-Y, Sun Y-X (2005) SVM with quadratic polynomial kernel function based nonlinear model one-step-ahead predictive control. Chin J Chem Eng 13:373–379
Google Scholar
Gutmann H-M (2001) A radial basis function method for global optimization. J Glob Optim 19:201–227
Article MathSciNet Google Scholar
Musavi MT, Ahmed W, Chan KH et al (1992) On the training of radial basis function classifiers. Neural Netw 5:595–603
Article Google Scholar
Krishnapuram R, Joshi A, Yi L (1999) A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. In: FUZZ-IEEE’99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No. 99CH36315). IEEE, pp 1281–1286
Cover TM, Thomas JA (2012) Elements of information theory. John Wiley & Sons
MATH Google Scholar
Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice hall, New Jersey
MATH Google Scholar
Briët J, Harremoës P (2009) Properties of classical and quantum Jensen-Shannon divergence. Phys Rev A 79:052311
Article Google Scholar
Fuglede B, Topsoe F (2004) Jensen-Shannon divergence and Hilbert space embedding. In: International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings. IEEE, p 31
Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhyā Indian J Stat 7(4):401–406
MathSciNet MATH Google Scholar
Basseville M (1989) Distance measures for signal processing and pattern recognition. Signal Process 18:349–369
Article MathSciNet Google Scholar
Chernoff H (1952) A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 23(4):493–507
Article MathSciNet Google Scholar
Zhou S, Chellappa R (2004) Probabilistic distance measures in reproducing kernel Hilbert space. SCR Technical Report, University of Maryland, USA
Google Scholar
Zhou SK, Chellappa R (2006) From sample similarity to ensemble similarity: probabilistic distance measures in reproducing kernel hilbert space. IEEE Trans Pattern Anal Mach Intell 28:917–929
Article Google Scholar
Zhang H, Guo H, Wang X et al (2020) Clothescounter: a framework for star-oriented clothes mining from videos. Neurocomputing 377:38–48
Article Google Scholar
Graves D, Pedrycz W (2010) Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study. Fuzzy Sets Syst 161:522–543
Article MathSciNet Google Scholar
Campello RJ (2007) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett 28:833–841
Article Google Scholar
Huang H-C, Chuang Y-Y, Chen C-S (2011) Multiple kernel fuzzy clustering. IEEE Trans Fuzzy Syst 20:120–134
Article Google Scholar
Lei Y, Bezdek JC, Chan J et al (2016) Extending information-theoretic validity indices for fuzzy clustering. IEEE Trans Fuzzy Syst 25:1013–1018
Article Google Scholar
http://cs.joensuu.fi/sipu/datasets/
Asuncion A, Newman D (2007) UCI machine learning repository

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their valuable comments and suggestions which helped to improve the quality of this paper.

Author information

Authors and Affiliations

Business Studies Program, School of Business, Stockton University, Galloway, NJ, 08205, USA
Behnam Tavakkol
Department of Industrial and Systems Engineering, Dongguk University-Seoul, Seoul, Korea
Youngdoo Son

Authors

Behnam Tavakkol
View author publications
You can also search for this author in PubMed Google Scholar
Youngdoo Son
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Behnam Tavakkol.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tavakkol, B., Son, Y. Fuzzy kernel K-medoids clustering algorithm for uncertain data objects. Pattern Anal Applic 24, 1287–1302 (2021). https://doi.org/10.1007/s10044-021-00983-z

Download citation

Received: 17 September 2020
Accepted: 29 April 2021
Published: 26 May 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10044-021-00983-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fuzzy kernel K-medoids clustering algorithm for uncertain data objects

Abstract

Access this article

Similar content being viewed by others

Complex Pythagorean Hesitant Fuzzy Aggregation Operators Based on Aczel-Alsina t-Norm and t-Conorm and Their Applications in Decision-Making

K-Means algorithm based on multi-feature-induced order

Parametric circular intuitionistic fuzzy information measures and multi-criteria decision making with extended TOPSIS

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fuzzy kernel K-medoids clustering algorithm for uncertain data objects

Abstract

Access this article

Similar content being viewed by others

Complex Pythagorean Hesitant Fuzzy Aggregation Operators Based on Aczel-Alsina t-Norm and t-Conorm and Their Applications in Decision-Making

K-Means algorithm based on multi-feature-induced order

Parametric circular intuitionistic fuzzy information measures and multi-criteria decision making with extended TOPSIS

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation