当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SMLSOM: The shrinking maximum likelihood self-organizing map
arXiv - CS - Information Retrieval Pub Date : 2021-04-28 , DOI: arxiv-2104.13971
Ryosuke Motegi, Yoichi Seki

Determining the number of clusters in a dataset is a fundamental issue in data clustering. Many methods have been proposed to solve the problem of selecting the number of clusters, considering it to be a problem with regard to model selection. This paper proposes a greedy algorithm that automatically selects a suitable number of clusters based on a probability distribution model framework. The algorithm includes two components. First, a generalization of Kohonen's self-organizing map (SOM), which has nodes linked to a probability distribution model, and which enables the algorithm to search for the winner based on the likelihood of each node, is introduced. Second, the proposed method uses a graph structure and a neighbor defined by the length of the shortest path between nodes, in contrast to Kohonen's SOM in which the nodes are fixed in the Euclidean space. This implementation makes it possible to update its graph structure by cutting links to weakly connected nodes to avoid unnecessary node deletion. The weakness of a node connection is measured using the Kullback--Leibler divergence and the redundancy of a node is measured by the minimum description length (MDL). This updating step makes it easy to determine the suitable number of clusters. Compared with existing methods, our proposed method is computationally efficient and can accurately select the number of clusters and perform clustering.

中文翻译:

SMLSOM:缩小的最大似然自组织图

确定数据集中的聚类数量是数据聚类中的一个基本问题。已经提出了许多方法来解决选择聚类数目的问题,认为这是关于模型选择的问题。本文提出了一种贪婪算法,该算法基于概率分布模型框架自动选择合适数量的聚类。该算法包括两个部分。首先,介绍了Kohonen自组织图(SOM)的一般化,该图具有链接到概率分布模型的节点,并使该算法能够基于每个节点的可能性来搜索获胜者。其次,与Kohonen'相比,所提出的方法使用图结构和由节点之间最短路径的长度定义的邻居。SOM,其中的节点固定在欧几里德空间中。这种实现方式可以通过切断到弱连接节点的链接来更新其图结构,从而避免不必要的节点删除。使用Kullback-Leibler散度来度量节点连接的弱点,并通过最小描述长度(MDL)来度量节点的冗余度。通过此更新步骤,可以轻松确定合适的群集数量。与现有方法相比,我们提出的方法计算效率高,可以准确地选择聚类数并执行聚类。使用Kullback-Leibler散度来度量节点连接的弱点,并通过最小描述长度(MDL)来度量节点的冗余度。通过此更新步骤,可以轻松确定合适的群集数量。与现有方法相比,我们提出的方法计算效率高,可以准确地选择聚类数并执行聚类。使用Kullback-Leibler散度来度量节点连接的弱点,并通过最小描述长度(MDL)来度量节点的冗余度。通过此更新步骤,可以轻松确定合适的群集数量。与现有方法相比,我们提出的方法计算效率高,可以准确地选择聚类数并执行聚类。
更新日期:2021-04-30
down
wechat
bug