当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards Efficient Canonical Polyadic Decomposition on Sunway Many-core Processor
Information Sciences Pub Date : 2020-12-01 , DOI: 10.1016/j.ins.2020.11.013
Ming Dun , Yunchun Li , Qingxiao Sun , Hailong Yang , Wei Li , Zhongzhi Luan , Lin Gan , Guangwen Yang , Depei Qian

Canonical Polyadic Decomposition (CPD) is one of the most popular tensor decomposition methods and plays an important role in big data analysis. For sparse tensor, the major computation procedure in CPD, which is known as matricized tensor times Khatri-Rao product (MTTKRP), exhibits discontinuous memory access and turns to be the performance bottleneck from achieving high performance on emerging processor architectures. In this paper, we propose swCPD, an efficient CPD implementation on the many-core Sunway processor. The swCPD accelerates the optimization algorithms dominating the performance of MTTKRP, including Alternating Least Squares (ALS), Gradient Descent (GD) and Randomized Block Sampling (RBS), as well as the latest fast Levenberg-Marquardt (fLM++) and Generalized Canonical Polyadic Decomposition with Stochastic Gradient Descent (GCP-SGD). The main idea adopted in swCPD is a hierarchical partitioning mechanism. From the computation perspective, the 64 Computation Processing Elements (CPEs) in a Sunway processor are divided into eight groups, with each group containing seven workers and one controller. From the data perspective, we partition the sparse tensor into different granularities, which are blocks, bands and tiles. Moreover, we develop a communication mechanism through register communication for cooperation between CPEs. We evaluate the implementation of swCPD with both synthesized and real-world datasets. The experiment results show that each optimized algorithm in swCPD achieves better performance than corresponding algorithms adopted in cutting-edge CPD implementations.



中文翻译:

在Sunway多核处理器上实现有效的规范多态分解

规范多态分解(CPD)是最流行的张量分解方法之一,在大数据分析中起着重要作用。对于稀疏张量,CPD中的主要计算过程(称为矩阵张量乘以Khatri-Rao乘积(MTTKRP))表现出不连续的内存访问,并成为在新兴处理器体系结构上实现高性能的性能瓶颈。在本文中,我们提出了swCPD,一种在多核Sunway处理器上的高效CPD实现。该swCPD加速主导MTTKRP性能的优化算法,包括交替最小二乘(ALS),梯度下降(GD)和随机块采样(RBS),以及最新的快速Levenberg-Marquardt(fLM ++)和具有随机性的广义规范多元分解梯度下降(GCP-SGD)。swCPD中采用的主要思想是分层划分机制。从计算的角度来看,在一个双威处理器64个的运算处理单元(CPE的)被分成八个,每个包含七个工人和一个控制器。从数据的角度来看,我们将稀疏张量划分为不同的粒度,即乐队瓷砖。此外,我们通过注册通信开发了一种通信机制,以实现CPE之间的合作。我们使用合成数据集和实际数据集评估swCPD的实现。实验结果表明,与最先进的CPD实现中采用的相应算法相比,swCPD中的每种优化算法均具有更好的性能。

更新日期:2020-12-01
down
wechat
bug