Skip to main content
Log in

Fuzzy kernel K-medoids clustering algorithm for uncertain data objects

  • Theoretical advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Most data mining algorithms are designed for traditional type of data objects which are referred to as certain data objects. Certain data objects contain no uncertainty information and are represented by a single point. Capturing uncertainty can result in better performance of algorithms as they might generate more accurate results. There are different ways of modeling uncertainty for data objects, two of the most popular ones are: (1) considering a group of points for each object and (2) considering a probability density function (pdf) for each object. Objects modeled in these ways are referred to as uncertain data objects. Fuzzy clustering is a well-established field of research for certain data. When fuzzy clustering algorithms are used, degrees of membership are generated for assignment of objects to clusters which gives the flexibility to express that objects can belong to more than one cluster. To the best of our knowledge, for uncertain data, there is only one existing fuzzy clustering algorithm in the literature. The existing uncertain fuzzy clustering algorithm, however, cannot properly create non-convex shaped clusters, and therefore, its performance is not that well on uncertain data sets with arbitrary-shaped clusters—clusters that are non-convex, unconventional, and possibly nonlinearly separable. In this paper, we propose a novel fuzzy kernel K-medoids clustering algorithm for uncertain objects which works well on data sets with arbitrary-shaped clusters. We show through several experiments on synthetic and real data that the proposed algorithm outperforms the competitor algorithms: certain fuzzy K-medoids and the uncertain fuzzy K-medoids.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Aggarwal CC, Philip SY (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21:609–623

    Article  Google Scholar 

  2. Chau M, Cheng R, Kao B, Ng J (2006) Uncertain data mining: An example in clustering location data. In: Ng W-K, Kitsuregawa M, Li J, Chang K (eds) Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 199–204

    Chapter  Google Scholar 

  3. Gullo F, Ponti G, Tagarelli A (2013) Minimizing the variance of cluster mixture models for clustering uncertain objects. Stat Anal Data Min ASA Data Sci J 6:116–135

    Article  MathSciNet  Google Scholar 

  4. Gullo F, Ponti G, Tagarelli A, Greco S (2017) An information-theoretic approach to hierarchical clustering of uncertain data. Inf Sci 402:199–215

    Article  Google Scholar 

  5. Gullo F, Ponti G, Tagarelli A (2010) Minimizing the variance of cluster mixture models for clustering uncertain objects. In: Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, pp 839–844

  6. Gullo F, Ponti G, Tagarelli A (2008) Clustering uncertain data via k-medoids. In: Greco S, Lukasiewicz T (eds) Scalable Uncertain Management. Springer, Berlin, pp 229–242

    Chapter  Google Scholar 

  7. Gullo F, Ponti G, Tagarelli A, Greco S (2008) A hierarchical algorithm for clustering uncertain data via an information-theoretic approach. In: Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on. IEEE, pp 821–826

  8. Jiang B, Pei J, Tao Y, Lin X (2013) Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng 25:751–763

    Article  Google Scholar 

  9. Kao B, Lee SD, Lee FK et al (2010) Clustering uncertain data using voronoi diagrams and r-tree index. IEEE Trans Knowl Data Eng 22:1219–1233

    Article  Google Scholar 

  10. Kriegel H-P, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 672–677

  11. Lee SD, Kao B, Cheng R (2007) Reducing UK-means to K-means. In: Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on. IEEE, pp 483–488

  12. Yang B, Zhang Y (2010) Kernel based K-medoids for clustering data with uncertainty. In: Cao L, Feng Y, Zhong J (eds) Advance Data Mining and Applications. Springer, Berlin, pp 246–253

    Chapter  Google Scholar 

  13. Qin B, Xia Y, Li F (2009) DTU: a decision tree for uncertain data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B (eds) Advances in Knowledge Discovery and Data Mining. Springer, Berlin, pp 4–15

    Chapter  Google Scholar 

  14. Tavakkol B, Jeong MK, Albin S (2021) Measures of scatter and fisher discriminant analysis for uncertain data. IEEE Trans Syst Man and Cybern Syst 51(3):1690–1703. https://doi.org/10.1109/TSMC.2019.2902508

    Article  Google Scholar 

  15. Tavakkol B, Jeong MK, Albin SL (2017) Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing 230:143–151

    Article  Google Scholar 

  16. Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, pp 483–493

  17. Jiang B, Pei J (2011) Outlier detection on uncertain data: Objects, instances, and inferences. In: 2011 IEEE 27th International Conference on Data Engineering. IEEE, pp 422–433

  18. Liu B, Xiao Y, Cao L et al (2013) SVDD-based outlier detection on uncertain data. Knowl Inf Syst 34:597–618

    Article  Google Scholar 

  19. Liu J, Deng H (2013) Outlier detection on uncertain data based on local information. Knowl-Based Syst 51:60–71

    Article  Google Scholar 

  20. Shaikh SA, Kitagawa H (2014) Top-k outlier detection from uncertain data. Int J Autom Comput 11:128–142

    Article  Google Scholar 

  21. Shaikh SA, Kitagawa H (2012) Distance-based outlier detection on uncertain data of Gaussian distribution. In: Asia-Pacific Web Conference. Springer, pp 109–121

  22. Wang B, Xiao G, Yu H, Yang X (2009) Distance-based outlier detection on uncertain data. In: 2009 Ninth IEEE International Conference on Computer and Information Technology. IEEE, pp 293–298

  23. Zhang H, Wang S, Xu X et al (2018) Tree2Vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 29:5304–5318

    Article  MathSciNet  Google Scholar 

  24. Yang M-S (1993) A survey of fuzzy clustering. Math Comput Model 18:1–16

    Article  MathSciNet  Google Scholar 

  25. Bora DJ, Gupta D, Kumar A (2014) A comparative study between fuzzy clustering algorithm and hard clustering algorithm. ArXiv Prepr ArXiv14046059

  26. Hamdan H, Govaert G (2005) Mixture model clustering of uncertain data. In: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ’05. IEEE, pp 879–884

  27. Kriegel H-P, Pfeifle M (2005) Hierarchical density-based clustering of uncertain data. In: Fifth IEEE International Conference on Data Mining (ICDM’05). IEEE, pp 4–pp

  28. Wang Y, Dong J, Zhou J, et al (2017) Fuzzy c-medoids method based on JS-divergence for uncertain data clustering. In: 2017 4th International Conference on Information, Cybernetics and Computational Social Systems (ICCSS). IEEE, pp 312–315

  29. Patra BK, Nandi S, Viswanath P (2011) A distance based clustering method for arbitrary shaped clusters in large datasets. Pattern Recognit 44:2862–2870

    Article  Google Scholar 

  30. Cha S-H (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1:1

    Google Scholar 

  31. Cui M, Lin Y (2009) Nonlinear numerical analysis in reproducing kernel space. Nova Science Publishers Inc., NewYork

    MATH  Google Scholar 

  32. Fan J, Heckman NE, Wand MP (1995) Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. J Am Stat Assoc 90:141–150

    Article  MathSciNet  Google Scholar 

  33. Zhong W-M, He G-L, Pi D-Y, Sun Y-X (2005) SVM with quadratic polynomial kernel function based nonlinear model one-step-ahead predictive control. Chin J Chem Eng 13:373–379

    Google Scholar 

  34. Gutmann H-M (2001) A radial basis function method for global optimization. J Glob Optim 19:201–227

    Article  MathSciNet  Google Scholar 

  35. Musavi MT, Ahmed W, Chan KH et al (1992) On the training of radial basis function classifiers. Neural Netw 5:595–603

    Article  Google Scholar 

  36. Krishnapuram R, Joshi A, Yi L (1999) A fuzzy relative of the k-medoids algorithm with application to web document and snippet clustering. In: FUZZ-IEEE’99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No. 99CH36315). IEEE, pp 1281–1286

  37. Cover TM, Thomas JA (2012) Elements of information theory. John Wiley & Sons

    MATH  Google Scholar 

  38. Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice hall, New Jersey

    MATH  Google Scholar 

  39. Briët J, Harremoës P (2009) Properties of classical and quantum Jensen-Shannon divergence. Phys Rev A 79:052311

    Article  Google Scholar 

  40. Fuglede B, Topsoe F (2004) Jensen-Shannon divergence and Hilbert space embedding. In: International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings. IEEE, p 31

  41. Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhyā Indian J Stat 7(4):401–406

    MathSciNet  MATH  Google Scholar 

  42. Basseville M (1989) Distance measures for signal processing and pattern recognition. Signal Process 18:349–369

    Article  MathSciNet  Google Scholar 

  43. Chernoff H (1952) A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 23(4):493–507

    Article  MathSciNet  Google Scholar 

  44. Zhou S, Chellappa R (2004) Probabilistic distance measures in reproducing kernel Hilbert space. SCR Technical Report, University of Maryland, USA

    Google Scholar 

  45. Zhou SK, Chellappa R (2006) From sample similarity to ensemble similarity: probabilistic distance measures in reproducing kernel hilbert space. IEEE Trans Pattern Anal Mach Intell 28:917–929

    Article  Google Scholar 

  46. Zhang H, Guo H, Wang X et al (2020) Clothescounter: a framework for star-oriented clothes mining from videos. Neurocomputing 377:38–48

    Article  Google Scholar 

  47. Graves D, Pedrycz W (2010) Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study. Fuzzy Sets Syst 161:522–543

    Article  MathSciNet  Google Scholar 

  48. Campello RJ (2007) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett 28:833–841

    Article  Google Scholar 

  49. Huang H-C, Chuang Y-Y, Chen C-S (2011) Multiple kernel fuzzy clustering. IEEE Trans Fuzzy Syst 20:120–134

    Article  Google Scholar 

  50. Lei Y, Bezdek JC, Chan J et al (2016) Extending information-theoretic validity indices for fuzzy clustering. IEEE Trans Fuzzy Syst 25:1013–1018

    Article  Google Scholar 

  51. http://cs.joensuu.fi/sipu/datasets/

  52. Asuncion A, Newman D (2007) UCI machine learning repository

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their valuable comments and suggestions which helped to improve the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Behnam Tavakkol.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tavakkol, B., Son, Y. Fuzzy kernel K-medoids clustering algorithm for uncertain data objects. Pattern Anal Applic 24, 1287–1302 (2021). https://doi.org/10.1007/s10044-021-00983-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-021-00983-z

Keywords

Navigation