当前位置: X-MOL 学术arXiv.cs.NA › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Augmented Tensor Decomposition with Stochastic Optimization
arXiv - CS - Numerical Analysis Pub Date : 2021-06-15 , DOI: arxiv-2106.07900
Chaoqi Yang, Cheng Qian, Navjot Singh, Cao Xiao, M Brandon Westover, Edgar Solomonik, Jimeng Sun

Tensor decompositions are powerful tools for dimensionality reduction and feature interpretation of multidimensional data such as signals. Existing tensor decomposition objectives (e.g., Frobenius norm) are designed for fitting raw data under statistical assumptions, which may not align with downstream classification tasks. Also, real-world tensor data are usually high-ordered and have large dimensions with millions or billions of entries. Thus, it is expensive to decompose the whole tensor with traditional algorithms. In practice, raw tensor data also contains redundant information while data augmentation techniques may be used to smooth out noise in samples. This paper addresses the above challenges by proposing augmented tensor decomposition (ATD), which effectively incorporates data augmentations to boost downstream classification. To reduce the memory footprint of the decomposition, we propose a stochastic algorithm that updates the factor matrices in a batch fashion. We evaluate ATD on multiple signal datasets. It shows comparable or better performance (e.g., up to 15% in accuracy) over self-supervised and autoencoder baselines with less than 5% of model parameters, achieves 0.6% ~ 1.3% accuracy gain over other tensor-based baselines, and reduces the memory footprint by 9X when compared to standard tensor decomposition algorithms.

中文翻译:

具有随机优化的增强张量分解

张量分解是用于降维和对信号等多维数据进行特征解释的强大工具。现有的张量分解目标(例如 Frobenius 范数)旨在在统计假设下拟合原始数据,这可能与下游分类任务不一致。此外,现实世界的张量数据通常是高阶的,并且具有包含数百万或数十亿条目的大维度。因此,使用传统算法分解整个张量是昂贵的。在实践中,原始张量数据也包含冗余信息,而数据增强技术可用于消除样本中的噪声。本文通过提出增强张量分解 (ATD) 来解决上述挑战,该方法有效地结合了数据增强以促进下游分类。为了减少分解的内存占用,我们提出了一种以批量方式更新因子矩阵的随机算法。我们在多个信号数据集上评估 ATD。与模型参数少于 5% 的自监督和自动编码器基线相比,它显示出可比或更好的性能(例如,准确度高达 15%),与其他基于张量的基线相比,准确度提高了 0.6% ~ 1.3%,并减少了与标准张量分解算法相比,内存占用减少了 9 倍。
更新日期:2021-06-17
down
wechat
bug