Distributed Graph Computation Meets Machine Learning,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Distributed Graph Computation Meets Machine Learning
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2020-07-01 , DOI: 10.1109/tpds.2020.2970047
Wencong Xiao , Jilong Xue , Youshan Miao , Zhen Li , Cheng Chen , Ming Wu , Wei Li , Lidong Zhou

TuX² is a new distributed graph engine that bridges graph computation and distributed machine learning. TuX² inherits the benefits of elegant graph computation model, efficient graph layout, and balanced parallelism to scale to billion-edge graphs, while extended and optimized for distributed machine learning to support heterogeneity in data model, Stale Synchronous Parallel in scheduling, and a new Mini-batch, Exchange, GlobalSync, and Apply (MEGA) model for programming. TuX² further introduces a hybrid vertex-cut graph optimization and supports various consistency models in fault tolerance for machine learning. We have developed a set of representative distributed machine learning algorithms in TuX², covering both supervised and unsupervised learning. Compared to the implementations on distributed machine learning platforms, writing those algorithms in TuX² takes only about 25 percent of the code: our graph computation model hides the detailed management of data layout, partitioning, and parallelism from developers. The extensive evaluation of TuX², using large datasets with up to 64 billion of edges, shows that TuX² outperforms PowerGraph/PowerLyra, the state-of-the-art distributed graph engines, by an order of magnitude, while beating two state-of-the-art distributed machine learning systems by at least 60 percent.

中文翻译：

分布式图计算遇到机器学习

图X ² 是一种新的分布式图引擎，可桥接图计算和分布式机器学习。图X ² 继承了优雅的图计算模型、高效的图布局和平衡并行性以扩展到十亿边图的优点，同时针对分布式机器学习进行了扩展和优化，以支持数据模型中的异构性、调度中的陈旧同步并行以及新的 Mini-批处理、Exchange、GlobalSync 和应用（美嘉) 编程模型。图X ²进一步引入了混合顶点切割图优化，并支持机器学习容错中的各种一致性模型。我们开发了一套具有代表性的分布式机器学习算法图X ²，涵盖有监督和无监督学习。与分布式机器学习平台上的实现相比，将这些算法编写在图X ²只需要大约 25% 的代码：我们的图计算模型对开发人员隐藏了数据布局、分区和并行性的详细管理。广泛的评价图X ²，使用具有多达 640 亿条边的大型数据集，表明图X ² 比最先进的分布式图引擎 PowerGraph/PowerLyra 高出一个数量级，同时击败两个最先进的分布式机器学习系统至少 60%。

更新日期：2020-07-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>