当前位置: X-MOL 学术IEEE Trans. Parallel Distrib. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Partitioning Models for General Medium-Grain Parallel Sparse Tensor Decomposition
IEEE Transactions on Parallel and Distributed Systems ( IF 5.6 ) Pub Date : 2020-08-04 , DOI: 10.1109/tpds.2020.3012624
M. Ozan Karsavuran , Seher Acer , Cevdet Aykanat

The focus of this article is efficient parallelization of the canonical polyadic decomposition algorithm utilizing the alternating least squares method for sparse tensors on distributed-memory architectures. We propose a hypergraph model for general medium-grain partitioning which does not enforce any topological constraint on the partitioning. The proposed model is based on splitting the given tensor into nonzero-disjoint component tensors. Then a mode-dependent coarse-grain hypergraph is constructed for each component tensor. A net amalgamation operation is proposed to form a composite medium-grain hypergraph from these mode-dependent coarse-grain hypergraphs to correctly encapsulate the minimization of the communication volume. We propose a heuristic which splits the nonzeros of dense slices to obtain sparse slices in component tensors. So we partially attain slice coherency at (sub)slice level since partitioning is performed on (sub)slices instead of individual nonzeros. We also utilize the well-known recursive-bipartitioning framework to improve the quality of the splitting heuristic. Finally, we propose a medium-grain tripartite graph model with the aim of a faster partitioning at the expense of increasing the total communication volume. Parallel experiments conducted on 10 real-world tensors on up to 1024 processors confirm the validity of the proposed hypergraph and graph models.

中文翻译:


一般中粒并行稀疏张量分解的划分模型



本文的重点是利用交替最小二乘法对分布式内存架构上的稀疏张量进行规范多元分解算法的高效并行化。我们提出了一种用于一般中粒度划分的超图模型,该模型不对划分施加任何拓扑约束。所提出的模型基于将给定张量分割为非零不相交分量张量。然后为每个分量张量构造一个与模式相关的粗粒度超图。提出了净合并操作,以从这些依赖于模式的粗粒度超图形成复合中粒度超图,以正确封装通信量的最小化。我们提出了一种启发式方法,可以分割密集切片的非零值以获得分量张量中的稀疏切片。因此,我们在(子)切片级别部分实现了切片一致性,因为分区是在(子)切片而不是单个非零上执行的。我们还利用著名的递归二分框架来提高分割启发式的质量。最后,我们提出了一种中粒度三方图模型,旨在以增加总通信量为代价实现更快的分区。在多达 1024 个处理器上对 10 个现实世界张量进行的并行实验证实了所提出的超图和图模型的有效性。
更新日期:2020-08-04
down
wechat
bug