True Load Balancing for Matricized Tensor Times Khatri-Rao Product,IEEE Transactions on Parallel and Distributed Systems

当前位置： X-MOL 学术 › IEEE Trans. Parallel Distrib. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

True Load Balancing for Matricized Tensor Times Khatri-Rao Product
IEEE Transactions on Parallel and Distributed Systems ( IF 5.3 ) Pub Date : 2021-01-22 , DOI: 10.1109/tpds.2021.3053836
Nabil Abubaker , Seher Acer , Cevdet Aykanat

MTTKRP is the bottleneck operation in algorithms used to compute the CP tensor decomposition. For sparse tensors, utilizing the compressed sparse fibers (CSF) storage format and the CSF-oriented MTTKRP algorithms is important for both memory and computational efficiency on distributed-memory architectures. Existing intelligent tensor partitioning models assume the computational cost of MTTKRP to be proportional to the total number of nonzeros in the tensor. However, this is not the case for the CSF-oriented MTTKRP on distributed-memory architectures. We outline two deficiencies of nonzero-based intelligent partitioning models when CSF-oriented MTTKRP operations are performed locally: failure to encode processors’ computational loads and increase in total computation due to fiber fragmentation. We focus on existing fine-grain hypergraph model and propose a novel vertex weighting scheme that enables this model encode correct computational loads of processors. We also propose to augment the fine-grain model by fiber nets for reducing the increase in total computational load via minimizing fiber fragmentation. In this way, the proposed model encodes minimizing the load of the bottleneck processor. Parallel experiments with real-world sparse tensors on up to 1024 processors prove the validity of the outlined deficiencies and demonstrate the merit of our proposed improvements in terms of parallel runtimes.

中文翻译：

矩阵张量时间的真实负载平衡Khatri-Rao产品

MTTKRP是用于计算CP张量分解的算法中的瓶颈操作。对于稀疏张量，利用压缩稀疏光纤（CSF）存储格式和面向CSF的MTTKRP算法对于分布式内存体系结构上的内存和计算效率均至关重要。现有的智能张量分区模型假定MTTKRP的计算成本与张量中非零的总数成比例。但是，在分布式内存体系结构上面向CSF的MTTKRP并非如此。当在本地执行面向CSF的MTTKRP操作时，我们概述了基于非零的智能分区模型的两个缺陷：无法对处理器的计算负荷进行编码，以及由于光纤碎片而导致的总计算量增加。我们关注于现有的细粒度超图模型，并提出了一种新颖的顶点加权方案，该方案使该模型能够对处理器的正确计算负荷进行编码。我们还建议通过纤维网扩展细粒度模型，以通过最小化纤维碎片来减少总计算负荷的增加。以这种方式，所提出的模型进行编码以使瓶颈处理器的负载最小化。在多达1024个处理器上使用实际稀疏张量进行的并行实验证明了所概述缺陷的有效性，并证明了我们在并行运行时方面提出的改进的优点。我们还建议通过纤维网扩展细粒度模型，以通过最小化纤维碎片来减少总计算负荷的增加。以这种方式，所提出的模型进行编码以使瓶颈处理器的负载最小化。在多达1024个处理器上使用实际稀疏张量进行的并行实验证明了所概述缺陷的有效性，并证明了我们在并行运行时方面提出的改进的优点。我们还建议通过纤维网扩展细粒度模型，以通过最小化纤维碎片来减少总计算负荷的增加。以这种方式，所提出的模型进行编码以使瓶颈处理器的负载最小化。在多达1024个处理器上使用实际稀疏张量进行的并行实验证明了所概述缺陷的有效性，并证明了我们在并行运行时方面提出的改进的优点。

更新日期：2021-02-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>