Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems,Journal of Data and Information Science

当前位置： X-MOL 学术 › Journal of Data and Information Science › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-Aspect Incremental Tensor Decomposition Based on Distributed In-Memory Big Data Systems
Journal of Data and Information Science ( IF 1.5 ) Pub Date : 2020-05-20 , DOI: 10.2478/jdis-2020-0010
Hye-Kyung Yang ₁ , Hwan-Seung Yong ₂

Affiliation

Abstract Purpose We propose InParTen2, a multi-aspect parallel factor analysis three-dimensional tensor decomposition algorithm based on the Apache Spark framework. The proposed method reduces re-decomposition cost and can handle large tensors. Design/methodology/approach Considering that tensor addition increases the size of a given tensor along all axes, the proposed method decomposes incoming tensors using existing decomposition results without generating sub-tensors. Additionally, InParTen2 avoids the calculation of Khari–Rao products and minimizes shuffling by using the Apache Spark platform. Findings The performance of InParTen2 is evaluated by comparing its execution time and accuracy with those of existing distributed tensor decomposition methods on various datasets. The results confirm that InParTen2 can process large tensors and reduce the re-calculation cost of tensor decomposition. Consequently, the proposed method is faster than existing tensor decomposition algorithms and can significantly reduce re-decomposition cost. Research limitations There are several Hadoop-based distributed tensor decomposition algorithms as well as MATLAB-based decomposition methods. However, the former require longer iteration time, and therefore their execution time cannot be compared with that of Spark-based algorithms, whereas the latter run on a single machine, thus limiting their ability to handle large data. Practical implications The proposed algorithm can reduce re-decomposition cost when tensors are added to a given tensor by decomposing them based on existing decomposition results without re-decomposing the entire tensor. Originality/value The proposed method can handle large tensors and is fast within the limited-memory framework of Apache Spark. Moreover, InParTen2 can handle static as well as incremental tensor decomposition.

中文翻译：

基于分布式内存大数据系统的多方面增量张量分解

摘要目的我们提出InParTen2，这是一种基于Apache Spark框架的多方面并行因子分析三维张量分解算法。所提出的方法减少了重新分解的成本并可以处理大张量。设计/方法/方法考虑到张量加法会沿所有轴增加给定张量的大小，因此该方法使用现有分解结果分解传入张量，而不会生成子张量。此外，InParTen2通过使用Apache Spark平台避免了Khari–Rao产品的计算，并最大程度地减少了改组。结果通过将InParTen2的执行时间和准确性与各种数据集上现有的分布式张量分解方法的执行时间和准确性进行比较，来评估其性能。结果证实，InParTen2可以处理大张量并减少张量分解的重新计算成本。因此，所提出的方法比现有的张量分解算法更快，并且可以显着降低重新分解成本。研究局限性有几种基于Hadoop的分布式张量分解算法以及基于MATLAB的分解方法。但是，前者需要更长的迭代时间，因此它们的执行时间无法与基于Spark的算法相提并论，而后者则在一台机器上运行，从而限制了它们处理大数据的能力。实际意义该算法可以在将张量添加到给定张量时通过基于现有分解结果对它们进行分解而无需重新分解整个张量来减少重新分解的成本。独创性/值所提出的方法可以处理大张量，并且在Apache Spark的有限内存框架内快速运行。此外，InParTen2可以处理静态和增量张量分解。

更新日期：2020-05-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文