当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A general framework for integrative analysis of incomplete multiomics data.
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2020-07-21 , DOI: 10.1002/gepi.22328
Dan-Yu Lin 1 , Donglin Zeng 1 , David Couper 1
Affiliation  

There is a tremendous current interest in measuring multiple types of omics features (e.g., DNA sequences, RNA expressions, methylation profiles, metabolic profiles, protein expressions) on a large number of subjects. Although genotypes are typically available for all study subjects, other data types may be measured only on a subset of subjects due to cost or other constraints. In addition, quantitative omics measurements, such as metabolite levels and protein expressions, are subject to detection limits in that the measurements below (or above) certain thresholds are not detectable. In this article, we propose a rigorous and powerful approach to handle missing values and detection limits in integrative analysis of multiomics data. We relate quantitative omics variables to genetic variants and other variables through linear regression models and relate phenotypes to quantitative omics variables and other variables through generalized linear models. We derive the joint‐likelihood for the two sets of models by allowing arbitrary patterns of missing values and detection limits for quantitative omics variables. We carry out maximum‐likelihood estimation through computationally fast and stable algorithms. The resulting estimators are approximately unbiased and statistically efficient. An application to a major study on chronic obstructive lung disease yielded new biological insights.

中文翻译:

不完整多组学数据综合分析的通用框架。

目前对测量大量受试者的多种组学特征(例如,DNA 序列、RNA 表达、甲基化谱、代谢谱、蛋白质表达)有着极大的兴趣。尽管基因型通常可用于所有研究对象,但由于成本或其他限制,其他数据类型可能仅对一部分对象进行测量。此外,定量组学测量(例如代谢物水平和蛋白质表达)受检测限的约束,因为低于(或高于)某些阈值的测量是无法检测到的。在本文中,我们提出了一种严谨而强大的方法来处理多组学数据综合分析中的缺失值和检测限。我们通过线性回归模型将定量组学变量与遗传变异和其他变量相关联,并通过广义线性模型将表型与定量组学变量和其他变量相关联。我们通过允许任意模式的缺失值和定量组学变量的检测限来推导出两组模型的联合似然。我们通过计算快速且稳定的算法进行最大似然估计。由此产生的估计量近似无偏且在统计上是有效的。一项关于慢性阻塞性肺病的重大研究的应用产生了新的生物学见解。我们通过允许任意模式的缺失值和定量组学变量的检测限来推导出两组模型的联合似然。我们通过计算快速且稳定的算法进行最大似然估计。由此产生的估计量近似无偏且在统计上是有效的。一项关于慢性阻塞性肺病的重大研究的应用产生了新的生物学见解。我们通过允许任意模式的缺失值和定量组学变量的检测限来推导出两组模型的联合似然。我们通过计算快速且稳定的算法进行最大似然估计。由此产生的估计量近似无偏且在统计上是有效的。一项关于慢性阻塞性肺病的重大研究的应用产生了新的生物学见解。
更新日期:2020-09-11
down
wechat
bug