当前位置: X-MOL 学术Stat. Anal. Data Min. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Multiple Response Regression for Gaussian Mixture Models with Known Labels.
Statistical Analysis and Data Mining ( IF 1.3 ) Pub Date : 2012-10-29 , DOI: 10.1002/sam.11158
Wonyul Lee 1 , Ying Du 1 , Wei Sun 1 , D Neil Hayes 1 , Yufeng Liu 1
Affiliation  

Multiple response regression is a useful regression technique to model multiple response variables using the same set of predictor variables. Most existing methods for multiple response regression are designed for modeling homogeneous data. In many applications, however, one may have heterogeneous data where the samples are divided into multiple groups. Our motivating example is a cancer dataset where the samples belong to multiple cancer subtypes. In this paper, we consider modeling the data coming from a mixture of several Gaussian distributions with known group labels. A naive approach is to split the data into several groups according to the labels and model each group separately. Although it is simple, this approach ignores potential common structures across different groups. We propose new penalized methods to model all groups jointly in which the common and unique structures can be identified. The proposed methods estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables. Asymptotic properties of the proposed methods are explored. Through numerical examples, we demonstrate that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods. An application to a glioblastoma cancer dataset reveals some interesting common and unique gene relationships across different cancer subtypes. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012

中文翻译:

具有已知标签的高斯混合模型的多重响应回归。

多重响应回归是一种有用的回归技术,可以使用相同的预测变量集对多个响应变量进行建模。大多数现有的多重响应回归方法都是为对同质数据建模而设计的。然而,在许多应用程序中,可能会有异构数据,其中样本被分成多个组。我们的激励示例是癌症数据集,其中样本属于多种癌症亚型。在本文中,我们考虑对来自具有已知组标签的多个高斯分布混合的数据进行建模。一种简单的方法是根据标签将数据分成几组,并分别对每组建模。虽然它很简单,但这种方法忽略了不同群体之间潜在的共同结构。我们提出了新的惩罚方法来联合建模所有组,其中可以识别出共同和独特的结构。所提出的方法估计回归系数矩阵,以及响应变量的条件逆协方差矩阵。探索了所提出方法的渐近特性。通过数值例子,我们证明可以通过使用所提出的方法对所有组进行联合建模来改进估计和预测。对胶质母细胞瘤癌症数据集的应用揭示了不同癌症亚型之间一些有趣的共同和独特的基因关系。© 2012 Wiley Periodicals, Inc. 统计分析和数据挖掘,2012 以及响应变量的条件逆协方差矩阵。探索了所提出方法的渐近特性。通过数值例子,我们证明可以通过使用所提出的方法对所有组进行联合建模来改进估计和预测。对胶质母细胞瘤癌症数据集的应用揭示了不同癌症亚型之间一些有趣的共同和独特的基因关系。© 2012 Wiley Periodicals, Inc. 统计分析和数据挖掘,2012 以及响应变量的条件逆协方差矩阵。探索了所提出方法的渐近特性。通过数值例子,我们证明可以通过使用所提出的方法对所有组进行联合建模来改进估计和预测。对胶质母细胞瘤癌症数据集的应用揭示了不同癌症亚型之间一些有趣的共同和独特的基因关系。© 2012 Wiley Periodicals, Inc. 统计分析和数据挖掘,2012 对胶质母细胞瘤癌症数据集的应用揭示了不同癌症亚型之间一些有趣的共同和独特的基因关系。© 2012 Wiley Periodicals, Inc. 统计分析和数据挖掘,2012 对胶质母细胞瘤癌症数据集的应用揭示了不同癌症亚型之间一些有趣的共同和独特的基因关系。© 2012 Wiley Periodicals, Inc. 统计分析和数据挖掘,2012
更新日期:2012-10-29
down
wechat
bug