当前位置: X-MOL 学术Commun. Stat. Simul. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Shannon’s entropy of partitions determined by hierarchical clustering trees in asymmetry and dimension identification
Communications in Statistics - Simulation and Computation ( IF 0.9 ) Pub Date : 2020-07-06
J. S. Corredor, A. J. Quiroz

In the multivariate statistics community, it is commonly acknowledged that among the hierarchical clustering tree (HCT) procedures, the single linkage rule for inter-cluster distance, tends to produce trees which are significantly more asymmetric than those obtained using other rules such as complete linkage, for instance. We consider the use of Shannon’s entropy of the partitions determined by HCTs as a measure of the asymmetry of the clustering trees. On a different direction, our simulations show an unexpected relationship between Shannon’s entropy of partitions and dimension of the data. Based on this observation a procedure for intrinsic dimension identification based on entropy of partitions is proposed and studied. A theoretical result is established for the dimension identification method stating that, locally, for continuous data on a d-dimensional manifold, the entropy of partitions behaves as if the local data were uniformly sampled from the unit ball of Rd. Evaluation on simulated examples shows that the method proposed compares favorably with other procedures for dimension identification available in the literature.



中文翻译:

由不对称性和维度识别中的层次聚类树确定的分区的香农熵

在多元统计社区中,通常公认的是,在层次聚类树(HCT)过程中,集群间距离的单个链接规则往往会产生比使用其他规则(例如完全链接)获得的树更加不对称的树, 例如。我们考虑使用由HCT确定的分区的Shannon熵来衡量聚类树的不对称性。在不同的方向上,我们的模拟显示了Shannon的分区熵与数据维之间的意外关系。在此基础上,提出并研究了基于分区熵的内在维数识别方法。建立了维数识别方法的理论结果,该方法表明局部地对于连续的数据。d维流形,分区的熵表现得像是从B的单位球均匀采样本地数据[Rd 对模拟示例的评估表明,所提出的方法与文献中提供的其他用于尺寸识别的程序相比具有优势。

更新日期:2020-07-06
down
wechat
bug