An analysis of the graph processing landscape,Journal of Big Data

当前位置： X-MOL 学术 › J. Big Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An analysis of the graph processing landscape
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-04-09 , DOI: 10.1186/s40537-021-00443-9
Miguel E Coimbra _{1,

2} , Alexandre P Francisco _{1,

2} , Luís Veiga _{1,

2}

Affiliation

The value of graph-based big data can be unlocked by exploring the topology and metrics of the networks they represent, and the computational approaches to this exploration take on many forms. For the use-case of performing global computations over a graph, it is first ingested into a graph processing system from one of many digital representations. Extracting information from graphs involves processing all their elements globally, which can be done with single-machine systems (with varying approaches to hardware usage), distributed systems (either homogeneous or heterogeneous groups of machines) and systems dedicated to high-performance computing (HPC). For these systems focused on processing the bulk of graph elements, common use-cases consist in executing for example algorithms for vertex ranking or community detection, which produce insights on graph structure and relevance of their elements. Many distributed systems (such as Flink, Spark) and libraries (e.g. Gelly, GraphX) have been built to enable these tasks and improve performance. This is achieved with techniques ranging from classic load balancing (often geared to reduce communication overhead) to exploring trade-offs between delaying computation and relaxing accuracy. In this survey we firstly familiarize the reader with common graph datasets and applications in the world of today. We provide an overview of different aspects of the graph processing landscape and describe classes of systems based on a set of dimensions we describe. The dimensions we detail encompass paradigms to express graph processing, different types of systems to use, coordination and communication models in distributed graph processing, partitioning techniques and different definitions related to the potential for a graph to be updated. This survey is aimed at both the experienced software engineer or researcher as well as the graduate student looking for an understanding of the landscape of solutions (and their limitations) for graph processing.

中文翻译：

图处理景观分析

基于图的大数据的价值可以通过探索它们所代表的网络的拓扑结构和指标来释放，并且这种探索的计算方法有多种形式。对于在图形上执行全局计算的用例，它首先从许多数字表示之一被摄取到图形处理系统中。从图形中提取信息涉及全局处理所有元素，这可以通过单机系统（采用不同的硬件使用方法）、分布式系统（同构或异构机器组）和专用于高性能计算（HPC）的系统来完成). 对于这些专注于处理大量图形元素的系统，常见的用例包括执行顶点排名或社区检测等算法，这产生了对图形结构及其元素相关性的见解。许多分布式系统（例如Flink、Spark）和库（例如Gelly、GraphX) 的构建是为了实现这些任务并提高性能。这是通过从经典负载平衡（通常旨在减少通信开销）到探索延迟计算和放宽准确性之间的权衡等技术来实现的。在本次调查中，我们首先让读者熟悉当今世界的常见图形数据集和应用程序。我们概述了图形处理领域的不同方面，并根据我们描述的一组维度描述了系统类别。我们详细介绍的维度包括表示图形处理的范例、要使用的不同类型的系统、分布式图形处理中的协调和通信模型、分区技术以及与图形更新潜力相关的不同定义。

更新日期：2021-04-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文