当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DHPV: a distributed algorithm for large-scale graph partitioning.
Journal of Big Data ( IF 8.6 ) Pub Date : 2020-09-16 , DOI: 10.1186/s40537-020-00357-y
Wilfried Yves Hamilton Adoni 1 , Tarik Nahhal 1 , Moez Krichen 2, 3 , Abdeltif El Byed 1 , Ismail Assayad 4
Affiliation  

Big graphs are part of the movement of “Not Only SQL” databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edges model the interactions or relationships between these data. They offer flexibility in handling data that is strongly connected to each other. The analysis of a big graph generally involves exploring all of its vertices. Thus, this operation is costly in time and resources because big graphs are generally composed of millions of vertices connected through billions of edges. Consequently, the graph algorithms are expansive compared to the size of the big graph, and are therefore ineffective for data exploration. Thus, partitioning the graph stands out as an efficient and less expensive alternative for exploring a big graph. This technique consists in partitioning the graph into a set of k sub-graphs in order to reduce the complexity of the queries. Nevertheless, it presents many challenges because it is an NP-complete problem. In this article, we present DPHV (Distributed Placement of Hub-Vertices) an efficient parallel and distributed heuristic for large-scale graph partitioning. An application on a real-world graphs demonstrates the feasibility and reliability of our method. The experiments carried on a 10-nodes Spark cluster proved that the proposed methodology achieves significant gain in term of time and outperforms JA-BE-JA, Greedy, DFEP.

中文翻译:

DHPV:一种用于大规模图形分区的分布式算法。

大图是“不仅SQL”数据库(也称为NoSQL)运动的一部分,该数据库关注数据之间的关系,而不是值本身。数据存储在顶点中,而边对这些数据之间的交互作用或关系建模。它们在处理相互之间紧密连接的数据时提供了灵活性。大图的分析通常涉及探索其所有顶点。因此,此操作在时间和资源上都是昂贵的,因为大图通常由通过数十亿条边连接的数百万个顶点组成。因此,与大图的大小相比,图算法具有扩展性,因此对于数据探索无效。因此,对图进行分区是探索大型图的一种有效且便宜的替代方法。该技术包括将图划分为k个子图的集合,以降低查询的复杂性。然而,由于它是一个NP完全问题,因此提出了许多挑战。在本文中,我们提出了DPHV(集线器顶点的分布式放置),用于大规模图形分区的一种有效的并行和分布式启发式方法。在现实世界的图形上的应用证明了我们方法的可行性和可靠性。在10个节点的Spark集群上进行的实验证明,所提出的方法在时间上获得了可观的收益,并且优于JA-BE-JA,Greedy和DFEP。我们提出了DPHV(集线器顶点的分布式放置),一种用于大规模图形分区的有效并行和分布式启发式方法。在现实世界的图形上的应用证明了我们方法的可行性和可靠性。在10个节点的Spark集群上进行的实验证明,所提出的方法在时间上获得了可观的收益,并且优于JA-BE-JA,Greedy和DFEP。我们提出了DPHV(集线器顶点的分布式放置)一种用于大规模图形分区的有效并行和分布式启发式方法。在现实世界的图形上的应用证明了我们方法的可行性和可靠性。在10个节点的Spark集群上进行的实验证明,该方法在时间上取得了可观的收益,并且优于JA-BE-JA,Greedy和DFEP。
更新日期:2020-09-16
down
wechat
bug