当前位置: X-MOL 学术arXiv.cs.SI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SliceNStitch: Continuous CP Decomposition of Sparse Tensor Streams
arXiv - CS - Social and Information Networks Pub Date : 2021-02-23 , DOI: arxiv-2102.11517
Taehyung Kwon, Inkyu Park, Dongjin Lee, Kijung Shin

Consider traffic data (i.e., triplets in the form of source-destination-timestamp) that grow over time. Tensors (i.e., multi-dimensional arrays) with a time mode are widely used for modeling and analyzing such multi-aspect data streams. In such tensors, however, new entries are added only once per period, which is often an hour, a day, or even a year. This discreteness of tensors has limited their usage for real-time applications, where new data should be analyzed instantly as it arrives. How can we analyze time-evolving multi-aspect sparse data 'continuously' using tensors where time is'discrete'? We propose SLICENSTITCH for continuous CANDECOMP/PARAFAC (CP) decomposition, which has numerous time-critical applications, including anomaly detection, recommender systems, and stock market prediction. SLICENSTITCH changes the starting point of each period adaptively, based on the current time, and updates factor matrices (i.e., outputs of CP decomposition) instantly as new data arrives. We show, theoretically and experimentally, that SLICENSTITCH is (1) 'Any time': updating factor matrices immediately without having to wait until the current time period ends, (2) Fast: with constant-time updates up to 759x faster than online methods, and (3) Accurate: with fitness comparable (specifically, 72 ~ 160%) to offline methods.

中文翻译:

SliceNStitch:稀疏张量流的连续CP分解

考虑随时间增长的流量数据(即,以源-目的地-时间戳记形式的三元组)。具有时间模式的张量(即多维数组)被广泛用于建模和分析此类多方面数据流。但是,在这种张量中,每个周期只能添加一次新条目,通常是一个小时,一天甚至一年。张量的这种离散性限制了它们在实时应用中的使用,在实时应用中,应在新数据到达时立即对其进行分析。我们如何使用时间为“离散”的张量“连续地”分析时间演变的多方面稀疏数据?我们建议使用SLICENSTITCH进行连续的CANDECOMP / PARAFAC(CP)分解,该分解具有许多时间紧迫的应用程序,包括异常检测,推荐系统和股票市场预测。SLICENSTITCH基于当前时间自适应地更改每个周期的起点,并在新数据到达时立即更新因子矩阵(即CP分解的输出)。我们从理论上和实验上证明,SLICENSTITCH是(1)“任何时间”:无需等待当前时间段结束即可立即更新因子矩阵,(2)快速:与在线方法相比,恒定时间的更新速度高达759倍,以及(3)准确:适合度与离线方法相当(具体为72〜160%)。
更新日期:2021-02-24
down
wechat
bug