Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NSVD: Normalized Singular Value Deviation Reveals Number of Latent Factors in Tensor Decomposition.
Big Data ( IF 4.6 ) Pub Date : 2020-10-19 , DOI: 10.1089/big.2020.0074 Yorgos Tsitsikas 1 , Evangelos E Papalexakis 1
Big Data ( IF 4.6 ) Pub Date : 2020-10-19 , DOI: 10.1089/big.2020.0074 Yorgos Tsitsikas 1 , Evangelos E Papalexakis 1
Affiliation
Tensor decomposition has been shown, time and time again, to be an effective tool in multiaspect data mining, especially in exploratory applications where the interest is in discovering hidden interpretable structure from the data. In such exploratory applications, the number of such hidden structures is of utmost importance since incorrect selection may imply the discovery of noisy artifacts that do not really represent a meaningful pattern. Although extremely important, selection of this number of latent factors, also known as low-rank, is very hard, and in most cases, practitioners and researchers resort to ad hoc trial-and-error or assume that somehow this number is known or is given via domain expertise. There has been a considerable amount of prior work that proposes heuristics for selecting this low rank. However, as we argue in this article, the state of the art in those heuristic methods is rather unstable and does not always reveal the correct answer. In this article, we propose the Normalized Singular Value Deviation (NSVD), a novel method for selecting the number of latent factors in Tensor Decomposition that is based on principled theoretical foundations. We extensively evaluate the effectiveness of NSVD in synthetic and real data and demonstrate that it yields a more robust, stable, and reliable estimation than state of the art. Finally, we provide an efficient compression scheme for facilitating the use of NSVD in very big tensors.
中文翻译:
NSVD:归一化奇异值偏差揭示了张量分解中潜在因素的数量。
张量分解已经一次又一次地被证明是多方面数据挖掘的有效工具,特别是在探索性应用中,其中的兴趣在于从数据中发现隐藏的可解释结构。在此类探索性应用中,此类隐藏结构的数量至关重要,因为不正确的选择可能意味着发现并不真正代表有意义的模式的噪声伪影。尽管极其重要,但选择这些潜在因素(也称为低等级)非常困难,并且在大多数情况下,从业者和研究人员诉诸于临时试错法或假设该数字以某种方式已知或通过领域专业知识给出。已经有大量先前的工作提出了选择这个低等级的启发式方法。然而,正如我们在本文中讨论的那样,这些启发式方法的最新技术相当不稳定,并不总能揭示正确的答案。在本文中,我们提出了归一化奇异值偏差 ( NSVD ),这是一种基于原则性理论基础在张量分解中选择潜在因子数量的新方法。我们广泛评估了NSVD的有效性在合成和真实数据中,并证明它比最先进的技术产生更稳健、稳定和可靠的估计。最后,我们提供了一种有效的压缩方案,以促进在非常大的张量中使用NSVD。
更新日期:2020-10-30
中文翻译:
NSVD:归一化奇异值偏差揭示了张量分解中潜在因素的数量。
张量分解已经一次又一次地被证明是多方面数据挖掘的有效工具,特别是在探索性应用中,其中的兴趣在于从数据中发现隐藏的可解释结构。在此类探索性应用中,此类隐藏结构的数量至关重要,因为不正确的选择可能意味着发现并不真正代表有意义的模式的噪声伪影。尽管极其重要,但选择这些潜在因素(也称为低等级)非常困难,并且在大多数情况下,从业者和研究人员诉诸于临时试错法或假设该数字以某种方式已知或通过领域专业知识给出。已经有大量先前的工作提出了选择这个低等级的启发式方法。然而,正如我们在本文中讨论的那样,这些启发式方法的最新技术相当不稳定,并不总能揭示正确的答案。在本文中,我们提出了归一化奇异值偏差 ( NSVD ),这是一种基于原则性理论基础在张量分解中选择潜在因子数量的新方法。我们广泛评估了NSVD的有效性在合成和真实数据中,并证明它比最先进的技术产生更稳健、稳定和可靠的估计。最后,我们提供了一种有效的压缩方案,以促进在非常大的张量中使用NSVD。