当前位置: X-MOL 学术arXiv.math.ST › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Kernel Biclustering algorithm in Hilbert Spaces
arXiv - MATH - Statistics Theory Pub Date : 2022-08-07 , DOI: arxiv-2208.03675
Marcos Matabuena, J. C Vidal, Oscar Hernan Madrid Padilla, Dino Sejdinovic

Biclustering algorithms partition data and covariates simultaneously, providing new insights in several domains, such as analyzing gene expression to discover new biological functions. This paper develops a new model-free biclustering algorithm in abstract spaces using the notions of energy distance (ED) and the maximum mean discrepancy (MMD) -- two distances between probability distributions capable of handling complex data such as curves or graphs. The proposed method can learn more general and complex cluster shapes than most existing literature approaches, which usually focus on detecting mean and variance differences. Although the biclustering configurations of our approach are constrained to create disjoint structures at the datum and covariate levels, the results are competitive. Our results are similar to state-of-the-art methods in their optimal scenarios, assuming a proper kernel choice, outperforming them when cluster differences are concentrated in higher-order moments. The model's performance has been tested in several situations that involve simulated and real-world datasets. Finally, new theoretical consistency results are established using some tools of the theory of optimal transport.

中文翻译:

希尔伯特空间中的核双聚类算法

双聚类算法同时划分数据和协变量,在多个领域提供新的见解,例如分析基因表达以发现新的生物学功能。本文使用能量距离 (ED) 和最大平均差异 (MMD) 的概念在抽象空间中开发了一种新的无模型双聚类算法 - 能够处理曲线或图形等复杂数据的概率分布之间的两个距离。与大多数现有文献方法相比,所提出的方法可以学习更一般和复杂的簇形状,这些方法通常侧重于检测均值和方差差异。尽管我们方法的双聚类配置受限于在基准和协变量级别创建不相交的结构,但结果具有竞争力。我们的结果在最佳场景中类似于最先进的方法,假设选择了适当的内核,当集群差异集中在高阶时刻时,性能优于它们。该模型的性能已经在涉及模拟和真实世界数据集的几种情况下进行了测试。最后,利用最优传输理论的一些工具建立了新的理论一致性结果。
更新日期:2022-08-09
down
wechat
bug