当前位置: X-MOL 学术Mach. Learn. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A parameter-less algorithm for tensor co-clustering
Machine Learning ( IF 4.3 ) Pub Date : 2021-06-11 , DOI: 10.1007/s10994-021-06002-w
Elena Battaglia , Ruggero G. Pensa

The majority of the data produced by human activities and modern cyber-physical systems involve complex relations among their features. Such relations can be often represented by means of tensors, which can be viewed as generalization of matrices and, as such, can be analyzed by using higher-order extensions of existing machine learning methods, such as clustering and co-clustering. Tensor co-clustering, in particular, has been proven useful in many applications, due to its ability of coping with n-modal data and sparsity. However, setting up a co-clustering algorithm properly requires the specification of the desired number of clusters for each mode as input parameters. This choice is already difficult in relatively easy settings, like flat clustering on data matrices, but on tensors it could be even more frustrating. To face this issue, we propose a new tensor co-clustering algorithm that does not require the number of desired co-clusters as input, as it optimizes an objective function based on a measure of association across discrete random variables (called Goodman and Kruskal’s \(\tau\)) that is not affected by their cardinality. We introduce different optimization schemes and show their theoretical and empirical convergence properties. Additionally, we show the effectiveness of our algorithm on both synthetic and real-world datasets, also in comparison with state-of-the-art co-clustering methods based on tensor factorization and latent block models.



中文翻译:

张量协同聚类的无参数算法

人类活动和现代网络物理系统产生的大部分数据都涉及其特征之间的复杂关系。这种关系通常可以通过张量来表示,张量可以看作是矩阵的泛化,因此可以通过使用现有机器学习方法的高阶扩展(例如聚类和共聚类)进行分析。特别是张量共聚类已被证明在许多应用中很有用,因为它能够处理n-模态数据和稀疏性。但是,正确设置协同聚类算法需要指定每种模式所需的聚类数量作为输入参数。在相对简单的设置中,这种选择已经很困难,比如数据矩阵上的平面聚类,但在张量上可能更令人沮丧。为了面对这个问题,我们提出了一种新的张量协同聚类算法,它不需要将所需的协同聚类的数量作为输入,因为它基于离散随机变量之间的关联度量来优化目标函数(称为 Goodman 和 Kruskal's \ (\tau\)) 不受它们的基数影响。我们介绍了不同的优化方案,并展示了它们的理论和经验收敛特性。此外,我们还展示了我们的算法在合成数据集和真实数据集上的有效性,并与基于张量分解和潜在块模型的最先进的协同聚类方法进行了比较。

更新日期:2021-06-13
down
wechat
bug