当前位置: X-MOL 学术WIREs Data Mining Knowl. Discov. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Validation of cluster analysis results on validation data: A systematic framework
WIREs Data Mining and Knowledge Discovery ( IF 7.8 ) Pub Date : 2021-12-23 , DOI: 10.1002/widm.1444
Theresa Ullmann 1 , Christian Hennig 2 , Anne‐Laure Boulesteix 1
Affiliation  

Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To assess the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic, structured review of the existing literature about this topic. For this purpose, we outline a formal framework that covers most existing approaches for validating clustering results on validation data. In particular, we review classical validation techniques such as internal and external validation, stability analysis, and visual validation, and show how they can be interpreted in terms of our framework. We define and formalize different types of validation of clustering results on a validation dataset, and give examples of how clustering studies from the applied literature that used a validation dataset can be seen as instances of our framework.

中文翻译:

在验证数据上验证聚类分析结果:一个系统框架

聚类分析是指用于类发现的广泛的数据分析技术,在许多应用领域都很流行。为了评估聚类结果的质量,文献中提出了不同的聚类验证程序。虽然对经典验证技术(例如内部和外部验证)进行了大量工作,但对使用验证数据集验证和复制聚类结果的关注较少。这样的数据集可能是原始数据集的一部分,在分析开始之前被分离,也可能是独立收集的数据集。我们对有关该主题的现有文献进行了系统、结构化的回顾。为此,我们概述了一个正式的框架,该框架涵盖了大多数现有的验证数据聚类结果的方法。尤其,我们回顾了经典的验证技术,例如内部和外部验证、稳定性分析和视觉验证,并展示了如何根据我们的框架来解释它们。我们在验证数据集上定义和形式化不同类型的聚类结果验证,并举例说明使用验证数据集的应用文献中的聚类研究如何被视为我们框架的实例。
更新日期:2021-12-23
down
wechat
bug