A clustering algorithm based on the weighted entropy of conditional attributes for mixed data,Concurrency and Computation: Practice and Experience

当前位置： X-MOL 学术 › Concurr. Comput. Pract. Exp. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A clustering algorithm based on the weighted entropy of conditional attributes for mixed data
Concurrency and Computation: Practice and Experience ( IF 2 ) Pub Date : 2021-03-31 , DOI: 10.1002/cpe.6293
Jing Zhou ₁ , Ke Chen ₁ , Jinsheng Liu ₂

Affiliation

A novel definition for weighted entropy is proposed to improve clustering performance for small and diverse datasets. First, intra-class and inter-class weighted entropies for categorical and numeric conditional attributes are respectively developed using the mathematical definition of entropy. Second, the weighted entropy is used to calculate cluster weights for mixed conditional attributes. A unique weighted clustering algorithm that adopts entropy as its primary description term, after integrating the corresponding distance calculation mechanism, is then introduced. Finally, a theoretical analysis and validation experiment were conducted using the UC-Irvine dataset. Results showed that the proposed algorithm offers high self-adaptability, as its clustering performance was superior to the existing K-prototypes, SBAC, and OCIL algorithms.

中文翻译：

一种基于条件属性加权熵的混合数据聚类算法

提出了加权熵的新定义，以提高小而多样的数据集的聚类性能。首先，使用熵的数学定义分别开发用于分类和数字条件属性的类内和类间加权熵。其次，加权熵用于计算混合条件属性的聚类权重。然后引入了一种独特的加权聚类算法，该算法以熵为主要描述项，并结合相应的距离计算机制。最后，使用 UC-Irvine 数据集进行了理论分析和验证实验。结果表明，该算法具有很强的自适应性，聚类性能优于现有的K-原型、SBAC 和 OCIL 算法。

更新日期：2021-03-31

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>