Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering,Journal of Classification

当前位置： X-MOL 学术 › J. Classif. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering
Journal of Classification ( IF 1.8 ) Pub Date : 2019-04-01 , DOI: 10.1007/s00357-019-09317-5
Zdeněk Šulc , Hana Řezanková

This paper deals with similarity measures for categorical data in hierarchical clustering, which can deal with variables with more than two categories, and which aspire to replace the simple matching approach standardly used in this area. These similarity measures consider additional characteristics of a dataset, such as a frequency distribution of categories or the number of categories of a given variable. The paper recognizes two main aims. First, to compare and evaluate the selected similarity measures regarding the quality of produced clusters in hierarchical clustering. Second, to propose new similarity measures for nominal variables. All the examined similarity measures are compared regarding the quality of the produced clusters using the mean ranked scores of two internal evaluation coefficients. The analysis is performed on the generated datasets, and thus, it allows determining in which particular situations a certain similarity measure is recommended for use.

中文翻译：

层次聚类中分类数据的相似性度量比较

本文研究了层次聚类中分类数据的相似性度量，它可以处理具有两个以上类别的变量，并希望取代该领域标准使用的简单匹配方法。这些相似性度量考虑了数据集的其他特征，例如类别的频率分布或给定变量的类别数量。本文承认两个主要目标。首先，比较和评估关于层次聚类中产生的聚类质量的所选相似性度量。其次，为名义变量提出新的相似性度量。使用两个内部评估系数的平均排名分数来比较所有检查的相似性度量，以了解生成的集群的质量。

更新日期：2019-04-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11