当前位置: X-MOL 学术Int. J. Comput. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Clustering of graphs using pseudo-guided random walk
Journal of Computational Science ( IF 3.1 ) Pub Date : 2021-01-23 , DOI: 10.1016/j.jocs.2020.101281
Zahid Halim , Hussain Mahmood Sargana , Aadam , Uzma , Muhammad Waqas

Clustering is an unsupervised learning task that models data as coherent groups. Multiple approaches have been proposed in the past to cluster large volumes of data. Graphs provide a logical mapping of many real-world datasets rich enough to reflect various peculiarities of numerous domains. Apart from k-means, k-medoid, and other well-known clustering algorithms, utilization of random walk-based approaches to cluster data is a prominent area of data mining research. Markov clustering algorithm and limited random walk-based clustering are the prominent techniques that utilize the concept of random walk. The main goal of this work is to address the task of clustering graphs using an efficient random walk-based method. A novel walk approach in a graph is presented here that determines the weight of the edges and the degree of the nodes. This information is utilized by the pseudo-guidance model to guide the random walk procedure. This work introduces the friends-of-friends concept during the random walk process so that the edges’ weights are determined utilizing an inclusive criterion. This concept enables a random walk to be initiated from the highest degree node. The random walk continues until the walking agent cannot find any unvisited neighbor(s). The agent walks to its neighbors if it finds a weight of one or more, otherwise the agent’s stopping criteria is met. The nodes visited in this walk form a cluster. Once a walk comes to halt, the visited nodes are removed from the original graph and the next walk starts in the remaining graph. This process continues until all nodes of the graph are traversed. The focus of this work remains random walk-based clustering of graphs. The proposed approach is evaluated using 18 real-world benchmark datasets utilizing six cluster validity indices, namely Davies-Bouldin index (DBI), Dunn index (DI), Silhouette coefficient (SC), Calinski-Harabasz index (CHI), modularity index, and normalized cut. This proposal is compared with seven closely related approaches from the same domain, namely, limited random walk, pairwise clustering, personalized page rank clustering, GAKH (genetic algorithm krill herd) graph clustering, mixing time of random walks, density-based clustering of large probabilistic graphs, and Walktrap. Experiments suggest better performance of this work based on the evaluation metrics.



中文翻译:

使用伪引导随机游动的图聚类

聚类是一项无监督的学习任务,它将数据建模为连贯的组。过去已经提出了多种方法来对大量数据进行聚类。图提供了许多现实世界的数据集的逻辑映射,这些数据集足够丰富,可以反映出众多领域的各种特点。除了k-均值,k-medoid和其他众所周知的聚类算法,利用基于随机游走的方法来聚类数据是数据挖掘研究的重要领域。马尔可夫聚类算法和有限的基于随机游走的聚类是利用随机游走概念的主要技术。这项工作的主要目标是使用有效的基于随机游动的方法来解决聚类图的任务。此处介绍了一种新颖的图形行走方法,该方法可以确定边缘的权重和节点的程度。伪指导模型利用该信息来指导随机行走过程。这项工作在随机行走过程中引入了“朋友之友”的概念,以便利用包含性标准确定边缘的权重。该概念使得能够从最高程度的节点开始随机游走。继续随机行走,直到行走代理找不到任何未拜访的邻居为止。如果发现一个或多个权重,则代理步行到邻居,否则满足代理的停止标准。在此步行中访问的节点形成一个集群。步行停止后,已访问的节点将从原始图形中删除,下一个步行将在其余图形中开始。该过程一直持续到遍历图的所有节点为止。这项工作的重点仍然是基于随机游动的图聚类。拟议的方法是使用18个真实世界的基准数据集进行评估的,这些数据集使用六个聚类有效性指数,即Davies-Bouldin指数(DBI),Dunn指数(DI),Silhouette系数(SC),Calinski-Harabasz指数(CHI),模块化指数,和标准化的切割。将该提案与来自同一领域的七个紧密相关的方法进行了比较,即有限随机游走,成对聚类,个性化页面等级聚类,GAKH(遗传算法磷虾群)图聚类,随机游走的混合时间,基于密度的大聚类概率图Walktrap。实验表明,基于评估指标,这项工作的性能更好。

更新日期:2021-01-29
down
wechat
bug