当前位置: X-MOL 学术J. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Suboptimal Comparison of Partitions
Journal of Classification ( IF 1.8 ) Pub Date : 2019-07-11 , DOI: 10.1007/s00357-019-09329-1
Jonathon J. O’Brien , Michael T. Lawson , Devin K. Schweppe , Bahjat F. Qaqish

The distinction between classification and clustering is often based on a priori knowledge of classification labels. However, in the purely theoretical situation where a data-generating model is known, the optimal solutions for clustering do not necessarily correspond to optimal solutions for classification. Exploring this divergence leads us to conclude that no standard measures of either internal or external validation can guarantee a correspondence with optimal clustering performance. We provide recommendations for the suboptimal evaluation of clustering performance. Such suboptimal approaches can provide valuable insight to researchers hoping to add a post hoc interpretation to their clusters. Indices based on pairwise linkage provide the clearest probabilistic interpretation, while a triplet-based index yields information on higher level structures in the data. Finally, a graphical examination of receiver operating characteristics generated from hierarchical clustering dendrograms can convey information that would be lost in any one number summary.

中文翻译:

分区的次优比较

分类和聚类之间的区别通常基于分类标签的先验知识。然而,在已知数据生成模型的纯理论情况下,聚类的最优解不一定对应于分类的最优解。探索这种差异使我们得出结论,内部或外部验证的标准措施都不能保证与最佳聚类性能对应。我们为聚类性能的次优评估提供建议。这种次优方法可以为希望为其集群添加事后解释的研究人员提供有价值的见解。基于成对链接的指数提供了最清晰的概率解释,而基于三元组的索引会产生有关数据中更高级别结构的信息。最后,从层次聚类树状图生成的接收器操作特性的图形检查可以传达在任何一个数字摘要中都会丢失的信息。
更新日期:2019-07-11
down
wechat
bug