当前位置: X-MOL 学术Int. J. Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multipartition clustering of mixed data with Bayesian networks
International Journal of Intelligent Systems ( IF 5.0 ) Pub Date : 2021-11-30 , DOI: 10.1002/int.22770
Fernando Rodriguez‐Sanchez 1 , Concha Bielza 1 , Pedro Larrañaga 1
Affiliation  

Real-world applications often involve multifaceted data with several reasonable interpretations. To cluster this data, we need methods that are able to produce multiple clustering solutions. To this purpose, it is interesting to learn a finite mixture model with multiple latent variables, where each latent variable represents a unique way to partition the data. However, although there is an extensive literature on multipartition clustering methods for categorical data and for continuous data, there is a lack of work for mixed data. In this paper, we propose a multipartition clustering method that is able to efficiently deal with mixed data by exploiting the Bayesian network factorization and the variational Bayes framework. We show the flexibility and applicability of the proposed method by solving clustering, density estimation, and missing data imputation tasks in real-world data sets. For reproducibility, all code, data, and results can be found in the following public repository: https://github.com/ferjorosa/mpc-mixed.

中文翻译:

使用贝叶斯网络对混合数据进行多分区聚类

现实世界的应用程序通常涉及具有多种合理解释的多方面数据。为了对这些数据进行聚类,我们需要能够产生多个聚类解决方案的方法。为此,学习具有多个潜在变量的有限混合模型很有趣,其中每个潜在变量代表一种独特的数据分区方式。然而,尽管有大量关于分类数据和连续数据的多分区聚类方法的文献,但缺乏混合数据的工作。在本文中,我们提出了一种多分区聚类方法,该方法能够通过利用贝叶斯网络分解和变分贝叶斯框架来有效地处理混合数据。我们通过解决聚类、密度估计、以及现实世界数据集中缺失的数据插补任务。为了重现性,所有代码、数据和结果都可以在以下公共存储库中找到:https://github.com/ferjorosa/mpc-mixed。
更新日期:2021-11-30
down
wechat
bug