当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed Graph Computation Meets Machine Learning
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2020-07-01 , DOI: 10.1109/tpds.2020.2970047
Wencong Xiao , Jilong Xue , Youshan Miao , Zhen Li , Cheng Chen , Ming Wu , Wei Li , Lidong Zhou

TuX2 is a new distributed graph engine that bridges graph computation and distributed machine learning. TuX2 inherits the benefits of elegant graph computation model, efficient graph layout, and balanced parallelism to scale to billion-edge graphs, while extended and optimized for distributed machine learning to support heterogeneity in data model, Stale Synchronous Parallel in scheduling, and a new Mini-batch, Exchange, GlobalSync, and Apply (MEGA) model for programming. TuX2 further introduces a hybrid vertex-cut graph optimization and supports various consistency models in fault tolerance for machine learning. We have developed a set of representative distributed machine learning algorithms in TuX2, covering both supervised and unsupervised learning. Compared to the implementations on distributed machine learning platforms, writing those algorithms in TuX2 takes only about 25 percent of the code: our graph computation model hides the detailed management of data layout, partitioning, and parallelism from developers. The extensive evaluation of TuX2, using large datasets with up to 64 billion of edges, shows that TuX2 outperforms PowerGraph/PowerLyra, the state-of-the-art distributed graph engines, by an order of magnitude, while beating two state-of-the-art distributed machine learning systems by at least 60 percent.

中文翻译:

分布式图计算遇到机器学习

图X 2 是一种新的分布式图引擎,可桥接图计算和分布式机器学习。 图X 2 继承了优雅的图计算模型、高效的图布局和平衡并行性以扩展到十亿边图的优点,同时针对分布式机器学习进行了扩展和优化,以支持数据模型中的异构性、调度中的陈旧同步并行以及新的 Mini-批处理、Exchange、GlobalSync 和应用(美嘉) 编程模型。 图X 2进一步引入了混合顶点切割图优化,并支持机器学习容错中的各种一致性模型。我们开发了一套具有代表性的分布式机器学习算法图X 2,涵盖有监督和无监督学习。与分布式机器学习平台上的实现相比,将这些算法编写在图X 2只需要大约 25% 的代码:我们的图计算模型对开发人员隐藏了数据布局、分区和并行性的详细管理。广泛的评价图X 2,使用具有多达 640 亿条边的大型数据集,表明 图X 2 比最先进的分布式图引擎 PowerGraph/PowerLyra 高出一个数量级,同时击败两个最先进的分布式机器学习系统至少 60%。
更新日期:2020-07-01
down
wechat
bug