当前位置: X-MOL 学术arXiv.cs.MS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions
arXiv - CS - Mathematical Software Pub Date : 2020-10-09 , DOI: arxiv-2010.04678
Christos Psarras, Lars Karlsson and Paolo Bientinesi

Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand. In this paper, we illustrate how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Therefore, it becomes possible to make efficient use of GPUs for further speedups; at the same time the technique is compatible with many enhancements typically used in ALS, such as line search, extrapolation, and non-negativity constraints. We introduce the Concurrent ALS algorithm and library, which offers an interface to Matlab, and a mechanism to effectively deal with the issue that decompositions complete at different times. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.

中文翻译:

多个同时典型多元分解的并发交替最小二乘法

张量分解,如 CANDECOMP/PARAFAC (CP),广泛用于各种应用,如化学计量学、信号处理和机器学习。用于计算此类分解的广泛使用的方法依赖于交替最小二乘法 (ALS) 算法。当组件数量较少时,无论其实现如何,ALS 都表现出较低的算术强度,这严重阻碍了其性能并使 GPU 卸载无效。我们观察到,在实践中,专家通常必须计算同一张量的多个分解,每个分解都有少量组件(通常少于 20 个),以最终找到最适合手头应用程序的组件。在本文中,我们说明了如何在算法级别将同一张量的多个分解融合在一起以增加算术强度。因此,可以有效地利用 GPU 进一步加速;同时,该技术与 ALS 中通常使用的许多增强功能兼容,例如线搜索、外推和非负约束。我们介绍了并发 ALS 算法和库,它提供了一个 Matlab 接口,以及一种有效处理分解在不同时间完成问题的机制。人工和真实数据集的实验结果表明,由于算术强度增加,完成时间更短。同时,该技术与 ALS 中通常使用的许多增强功能兼容,例如线搜索、外推和非负约束。我们介绍了并发 ALS 算法和库,它提供了一个 Matlab 接口,以及一种有效处理分解在不同时间完成问题的机制。人工和真实数据集的实验结果表明,由于算术强度增加,完成时间更短。同时,该技术与 ALS 中通常使用的许多增强功能兼容,例如线搜索、外推和非负约束。我们介绍了并发 ALS 算法和库,它提供了一个 Matlab 接口,以及一种有效处理分解在不同时间完成问题的机制。人工和真实数据集的实验结果表明,由于算术强度增加,完成时间更短。
更新日期:2020-10-12
down
wechat
bug