Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2022-05-24 , DOI: 10.1080/10618600.2022.2067860 Dongbang Yuan 1 , Irina Gaynanova 1
Abstract
We consider the problem of extracting joint and individual signals from multi-view data, that is, data collected from different sources on matched samples. While existing methods for multi-view data decomposition explore single matching of data by samples, we focus on double-matched multi-view data (matched by both samples and source features). Our motivating example is the miRNA data collected from both primary tumor and normal tissues of the same subjects; the measurements from two tissues are thus matched both by subjects and by miRNAs. Our proposed double-matched matrix decomposition allows us to simultaneously extract joint and individual signals across subjects, as well as joint and individual signals across miRNAs. Our estimation approach takes advantage of double-matching by formulating a new type of optimization problem with explicit row space and column space constraints, for which we develop an efficient iterative algorithm. Numerical studies indicate that taking advantage of double-matching leads to superior signal estimation performance compared to existing multi-view data decomposition based on single-matching. We apply our method to miRNA data as well as data from the English Premier League soccer matches and find joint and individual multi-view signals that align with domain-specific knowledge. Supplementary materials for this article are available online.
中文翻译:
多视图数据的双匹配矩阵分解
摘要
我们考虑从多视图数据中提取联合信号和个体信号的问题,即从匹配样本的不同来源收集的数据。虽然现有的多视图数据分解方法探索了样本数据的单一匹配,但我们专注于双重匹配的多视图数据(由样本和源特征匹配)。我们的激励性示例是从同一受试者的原发性肿瘤和正常组织中收集的 miRNA 数据;因此,来自两个组织的测量结果与受试者和 miRNA 都匹配。我们提出的双匹配矩阵分解使我们能够同时提取受试者间的联合信号和个体信号,以及 miRNA 间的联合信号和个体信号。我们的估计方法通过制定具有显式行空间和列空间约束的新型优化问题来利用双重匹配,为此我们开发了一种有效的迭代算法。数值研究表明,与现有的基于单匹配的多视图数据分解相比,利用双匹配可以带来更好的信号估计性能。我们将我们的方法应用于 miRNA 数据以及来自英超联赛足球比赛的数据,并找到与特定领域知识相一致的联合和个人多视图信号。本文的补充材料可在线获取。数值研究表明,与现有的基于单匹配的多视图数据分解相比,利用双匹配可以带来更好的信号估计性能。我们将我们的方法应用于 miRNA 数据以及来自英超联赛足球比赛的数据,并找到与特定领域知识相一致的联合和个人多视图信号。本文的补充材料可在线获取。数值研究表明,与现有的基于单匹配的多视图数据分解相比,利用双匹配可以带来更好的信号估计性能。我们将我们的方法应用于 miRNA 数据以及来自英超联赛足球比赛的数据,并找到与特定领域知识相一致的联合和个人多视图信号。本文的补充材料可在线获取。