当前位置: X-MOL 学术Stat. Appl. Genet. Molecul. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A variable selection approach in the multivariate linear model: an application to LC-MS metabolomics data.
Statistical Applications in Genetics and Molecular Biology ( IF 0.9 ) Pub Date : 2018-09-13 , DOI: 10.1515/sagmb-2017-0077
Marie Perrot-Dockès 1 , Céline Lévy-Leduc 1 , Julien Chiquet 1 , Laure Sansonnet 1 , Margaux Brégère 1 , Marie-Pierre Étienne 1 , Stéphane Robin 1 , Grégory Genta-Jouve 2
Affiliation  

Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. Applying statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure within the multivariate linear model framework that accounts for the dependence between the multiple responses. We shall focus on a specific type of dependence which consists in assuming that the responses of a given individual can be modelled as a time series. We propose a novel Lasso-based approach within the framework of the multivariate linear model taking into account the dependence structure by using different types of stationary processes covariance structures for the random error matrix. Our numerical experiments show that including the estimation of the covariance matrix of the random error matrix in the Lasso criterion dramatically improves the variable selection performance. Our approach is successfully applied to an untargeted LC-MS (Liquid Chromatography-Mass Spectrometry) data set made of African copals samples. Our methodology is implemented in the R package MultiVarSel which is available from the Comprehensive R Archive Network (CRAN).

中文翻译:

多元线性模型中的变量选择方法:应用于LC-MS代谢组学数据。

卵数据的特征是存在强依赖性结构,这种依赖性结构是由于数据采集或某些潜在的生物学过程而产生的。将不调整变量选择步骤的统计程序应用于依存模式可能会导致功率损失和虚假变量的选择。本文的目的是在多元线性模型框架内提出一个变量选择程序,该程序考虑了多个响应之间的依赖性。我们将专注于一种特定类型的依赖关系,即假设可以将给定个人的响应建模为时间序列。我们在多元线性模型的框架内提出了一种新颖的基于套索的方法,该方法考虑了依赖关系,方法是对随机误差矩阵使用不同类型的平稳过程协方差结构。我们的数值实验表明,将随机误差矩阵的协方差矩阵的估计包括在Lasso标准中,可以显着提高变量选择性能。我们的方法已成功应用于非洲椰子样品制成的非目标LC-MS(液相色谱-质谱)数据集。我们的方法在R包MultiVarSel中实现,该包可从综合R存档网络(CRAN)获得。我们的数值实验表明,将随机误差矩阵的协方差矩阵的估计包括在Lasso标准中,可以显着提高变量选择性能。我们的方法已成功应用于非洲椰子样品制成的非目标LC-MS(液相色谱-质谱)数据集。我们的方法在R包MultiVarSel中实现,该包可从综合R存档网络(CRAN)获得。我们的数值实验表明,将随机误差矩阵的协方差矩阵的估计包括在Lasso准则中可显着提高变量选择性能。我们的方法已成功应用于非洲椰子样品制成的非目标LC-MS(液相色谱-质谱)数据集。我们的方法在R包MultiVarSel中实现,该包可从综合R存档网络(CRAN)获得。
更新日期:2019-11-01
down
wechat
bug