当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integrative analysis of time course metabolic data and biomarker discovery.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-01-09 , DOI: 10.1186/s12859-019-3333-0
Takoua Jendoubi 1, 2 , Timothy M D Ebbels 3
Affiliation  

BACKGROUND Metabolomics time-course experiments provide the opportunity to understand the changes to an organism by observing the evolution of metabolic profiles in response to internal or external stimuli. Along with other omic longitudinal profiling technologies, these techniques have great potential to uncover complex relations between variations across diverse omic variables and provide unique insights into the underlying biology of the system. However, many statistical methods currently used to analyse short time-series omic data are i) prone to overfitting, ii) do not fully take into account the experimental design or iii) do not make full use of the multivariate information intrinsic to the data or iv) are unable to uncover multiple associations between different omic data. The model we propose is an attempt to i) overcome overfitting by using a weakly informative Bayesian model, ii) capture experimental design conditions through a mixed-effects model, iii) model interdependencies between variables by augmenting the mixed-effects model with a conditional auto-regressive (CAR) component and iv) identify potential associations between heterogeneous omic variables by using a horseshoe prior. RESULTS We assess the performance of our model on synthetic and real datasets and show that it can outperform comparable models for metabolomic longitudinal data analysis. In addition, our proposed method provides the analyst with new insights on the data as it is able to identify metabolic biomarkers related to treatment, infer perturbed pathways as a result of treatment and find significant associations with additional omic variables. We also show through simulation that our model is fairly robust against inaccuracies in metabolite assignments. On real data, we demonstrate that the number of profiled metabolites slightly affects the predictive ability of the model. CONCLUSIONS Our single model approach to longitudinal analysis of metabolomics data provides an approach simultaneously for integrative analysis and biomarker discovery. In addition, it lends better interpretation by allowing analysis at the pathway level. An accompanying R package for the model has been developed using the probabilistic programming language Stan. The package offers user-friendly functions for simulating data, fitting the model, assessing model fit and postprocessing the results. The main aim of the R package is to offer freely accessible resources for integrative longitudinal analysis for metabolomics scientists and various visualization functions easy-to-use for applied researchers to interpret results.

中文翻译:

时程代谢数据和生物标志物发现的综合分析。

背景代谢组学时程实验提供了通过观察响应于内部或外部刺激的代谢谱的演变来了解生物体变化的机会。与其他组学纵向分析技术一起,这些技术具有巨大的潜力,可以揭示不同组学变量之间的复杂关系,并为系统的潜在生物学提供独特的见解。然而,目前用于分析短时间序列组学数据的许多统计方法存在 i) 容易过度拟合,ii) 未充分考虑实验设计或 iii) 未充分利用数据固有的多元信息或iv) 无法发现不同组学数据之间的多重关联。我们提出的模型是尝试 i) 通过使用信息量较弱的贝叶斯模型来克服过度拟合,ii) 通过混合效应模型捕获实验设计条件,iii) 通过使用条件自动增强混合效应模型来模拟变量之间的相互依赖性-回归(CAR)组件和iv)通过使用马蹄形先验识别异质组学变量之间的潜在关联。结果 我们评估了我们的模型在合成数据集和真实数据集上的性能,并表明它在代谢组学纵向数据分析中的表现优于可比模型。此外,我们提出的方法为分析人员提供了对数据的新见解,因为它能够识别与治疗相关的代谢生物标志物,推断由于治疗而受到干扰的途径,并发现与其他组学变量的显着关联。我们还通过模拟表明,我们的模型对代谢物分配的不准确性相当稳健。在真实数据上,我们证明了分析代谢物的数量会轻微影响模型的预测能力。结论 我们对代谢组学数据进行纵向分析的单一模型方法提供了一种同时用于综合分析和生物标志物发现的方法。此外,它允许在路径级别进行分析,从而提供更好的解释。使用概率编程语言 Stan 开发了模型的随附 R 包。该软件包提供了用户友好的功能,用于模拟数据、拟合模型、评估模型拟合和后处理结果。
更新日期:2020-01-09
down
wechat
bug