当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Computing Graph Descriptors on Edge Streams
arXiv - CS - Data Structures and Algorithms Pub Date : 2021-09-02 , DOI: arxiv-2109.01494
Zohair Raza Hassan, Imdadullah Khan, Mudassir Shabbir, Waseem Abbas

Graph feature extraction is a fundamental task in graphs analytics. Using feature vectors (graph descriptors) in tandem with data mining algorithms that operate on Euclidean data, one can solve problems such as classification, clustering, and anomaly detection on graph-structured data. This idea has proved fruitful in the past, with spectral-based graph descriptors providing state-of-the-art classification accuracy on benchmark datasets. However, these algorithms do not scale to large graphs since: 1) they require storing the entire graph in memory, and 2) the end-user has no control over the algorithm's runtime. In this paper, we present single-pass streaming algorithms to approximate structural features of graphs (counts of subgraphs of order $k \geq 4$). Operating on edge streams allows us to avoid keeping the entire graph in memory, and controlling the sample size enables us to control the time taken by the algorithm. We demonstrate the efficacy of our descriptors by analyzing the approximation error, classification accuracy, and scalability to massive graphs. Our experiments showcase the effect of the sample size on approximation error and predictive accuracy. The proposed descriptors are applicable on graphs with millions of edges within minutes and outperform the state-of-the-art descriptors in classification accuracy.

中文翻译:

在边缘流上计算图描述符

图特征提取是图分析中的一项基本任务。将特征向量(图描述符)与对欧几里德数据进行操作的数据挖掘算法相结合,可以解决诸如图结构数据的分类、聚类和异常检测等问题。这个想法在过去被证明是卓有成效的,基于光谱的图描述符在基准数据集上提供了最先进的分类精度。然而,这些算法不能扩展到大图,因为:1)它们需要将整个图存储在内存中,2)最终用户无法控制算法的运行时间。在本文中,我们提出了单通道流算法来近似图的结构特征($k \geq 4$ 阶子图的计数)。在边缘流上操作允许我们避免将整个图保存在内存中,控制样本大小使我们能够控制算法所花费的时间。我们通过分析近似误差、分类准确性和大规模图的可扩展性来证明我们的描述符的有效性。我们的实验展示了样本大小对近似误差和预测精度的影响。所提出的描述符适用于在几分钟内具有数百万条边的图,并且在分类精度方面优于最先进的描述符。
更新日期:2021-09-06
down
wechat
bug