Clustering with statistical error control,Scandinavian Journal of Statistics

当前位置： X-MOL 学术 › Scand. J. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Clustering with statistical error control
Scandinavian Journal of Statistics ( IF 1 ) Pub Date : 2020-02-09 , DOI: 10.1111/sjos.12450
Michael Vogt ₁ , Matthias Schmid ₂

Affiliation

This article presents a clustering approach that allows for rigorous statistical error control similar to a statistical test. We develop estimators for both the unknown number of clusters and the clusters themselves. The estimators depend on a tuning parameter α which is similar to the significance level of a statistical hypothesis test. By choosing α, one can control the probability of overestimating the true number of clusters, while the probability of underestimation is asymptotically negligible. In addition, the probability that the estimated clusters differ from the true ones is controlled. In the theoretical part of the article, formal versions of these statements on statistical error control are derived in a baseline model with convex clusters. A simulation study and two applications to temperature and gene expression microarray data complement the theoretical analysis.

中文翻译：

具有统计误差控制的聚类

本文介绍了一种聚类方法，该方法允许类似于统计测试的严格统计误差控制。我们为未知数量的集群和集群本身开发了估计器。估计量取决于与统计假设检验的显着性水平相似的调整参数α。通过选择α，可以控制高估真实簇数的概率，而低估的概率渐近可以忽略不计。此外，估计集群与真实集群不同的概率是可控的。在文章的理论部分，这些关于统计误差控制的陈述的正式版本是在具有凸簇的基线模型中导出的。模拟研究和温度和基因表达微阵列数据的两个应用补充了理论分析。

更新日期：2020-02-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>