U-statistical inference for hierarchical clustering,Journal of Computational and Graphical Statistics

当前位置： X-MOL 学术 › J. Comput. Graph. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

U-statistical inference for hierarchical clustering
Journal of Computational and Graphical Statistics ( IF 2.4 ) Pub Date : 2020-08-25 , DOI: 10.1080/10618600.2020.1796398
Marcio Valk ₁ , Gabriela Bettella Cybis ₁

Affiliation

Clustering methods are a valuable tool for the identification of patterns in high dimensional data with applications in many scientific problems. However, quantifying uncertainty in clustering is a challenging problem, particularly when dealing with High Dimension Low Sample Size (HDLSS) data. We develop here a U-statistics based clustering approach that assesses statistical significance in clustering and is specifically tailored to HDLSS scenarios. These non-parametric methods rely on very few assumptions about the data, and thus can be applied to a wide range of datasets for which the euclidean distance captures relevant features. We propose two significance clustering algorithms, a hierarchical method and a non-nested version. In order to do so, we first propose an extension of a relevant U-statistics and develop its asymptotic theory. Our methods are tested through extensive simulations and found to be more powerful than competing alternatives. They are further showcased in two applications ranging from genetics to image recognition problems.

中文翻译：

层次聚类的 U 统计推断

聚类方法是用于识别高维数据中的模式的重要工具，可应用于许多科学问题。然而，量化聚类中的不确定性是一个具有挑战性的问题，尤其是在处理高维低样本大小 (HDLSS) 数据时。我们在这里开发了一种基于 U 统计的聚类方法，该方法评估聚类中的统计显着性，并且专门针对 HDLSS 场景量身定制。这些非参数方法依赖于对数据的很少假设，因此可以应用于欧几里德距离捕获相关特征的广泛数据集。我们提出了两种重要性聚类算法，一种分层方法和一种非嵌套版本。为此，我们首先提出相关 U 统计的扩展并发展其渐近理论。我们的方法通过广泛的模拟测试，发现比竞争替代方案更强大。它们在从遗传学到图像识别问题的两个应用中得到进一步展示。

更新日期：2020-08-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>