当前位置: X-MOL 学术J. Classif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Note: t for Two (Clusters)
Journal of Classification ( IF 1.8 ) Pub Date : 2019-07-11 , DOI: 10.1007/s00357-019-09335-3
Stanley L. Sclove

Abstract The computation for cluster analysis is done by iterative algorithms. But here, a straightforward, non-iterative procedure is presented for clustering in the special case of one variable and two groups. The method is univariate but may reasonably be applied to multivariate datasets when the first principal component or a single factor explains much of the variation in the data. The t method is motivated by the fact that minimizing the within-groups sum of squares is equivalent to maximizing the between-groups sum of squares, and that Student’s t statistic measures the between-groups difference in means relative to within-groups variation. That is, the t statistic is the ratio of the difference in sample means, divided by the standard error of this difference. So, maximizing the t statistic is developed as a method for clustering univariate data into two clusters. In this situation, the t method gives the same results as the K-means algorithm. K-means tacitly assumes equality of variances; here, however, with t, equality of variances need not be assumed because separate variances may be used in computing t. The t method is applied to some datasets; the results are compared with those obtained by fitting mixtures of distributions.

中文翻译:

注意:t 代表两个(集群)

摘要 聚类分析的计算是通过迭代算法完成的。但在这里,提出了一个简单的、非迭代的过程,用于在一个变量和两个组的特殊情况下进行聚类。该方法是单变量的,但当第一主成分或单个因素解释了数据中的大部分变化时,可以合理地应用于多变量数据集。t 方法的动机是这样一个事实,即最小化组内平方和等效于最大化组间平方和,并且学生的 t 统计量测量相对于组内变异的均值的组间差异。也就是说,t 统计量是样本均值的差异除以该差异的标准误的比率。所以,最大化 t 统计量被开发为一种将单变量数据聚类为两个聚类的方法。在这种情况下,t 方法给出与 K-means 算法相同的结果。K-means 默许方差相等;然而,在这里,对于 t,不需要假设方差相等,因为在计算 t 时可以使用单独的方差。t 方法应用于一些数据集;将结果与通过拟合分布混合获得的结果进行比较。
更新日期:2019-07-11
down
wechat
bug