当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fully Bayesian analysis of RNA-seq counts for the detection of gene expression heterosis
Journal of the American Statistical Association ( IF 3.0 ) Pub Date : 2018-11-13 , DOI: 10.1080/01621459.2018.1497496
Will Landau 1 , Jarad Niemi 1 , Dan Nettleton 1
Affiliation  

ABSTRACT Heterosis, or hybrid vigor, is the enhancement of the phenotype of hybrid progeny relative to their inbred parents. Heterosis is extensively used in agriculture, and the underlying mechanisms are unclear. To investigate the molecular basis of phenotypic heterosis, researchers search tens of thousands of genes for heterosis with respect to expression in the transcriptome. Difficulty arises in the assessment of heterosis due to composite null hypotheses and nonuniform distributions for p-values under these null hypotheses. Thus, we develop a general hierarchical model for count data and a fully Bayesian analysis in which an efficient parallelized Markov chain Monte Carlo algorithm ameliorates the computational burden. We use our method to detect gene expression heterosis in a two-hybrid plant-breeding scenario, both in a real RNA-seq maize dataset and in simulation studies. In the simulation studies, we show our method has well-calibrated posterior probabilities and credible intervals when the model assumed in analysis matches the model used to simulate the data. Although model misspecification can adversely affect calibration, the methodology is still able to accurately rank genes. Finally, we show that hyperparameter posteriors are extremely narrow and an empirical Bayes (eBayes) approach based on posterior means from the fully Bayesian analysis provides virtually equivalent posterior probabilities, credible intervals, and gene rankings relative to the fully Bayesian solution. This evidence of equivalence provides support for the use of eBayes procedures in RNA-seq data analysis if accurate hyperparameter estimates can be obtained. Supplementary materials for this article are available online.

中文翻译:

用于检测基因表达杂种优势的 RNA-seq 计数的完全贝叶斯分析

摘要 杂种优势或杂种优势是杂种后代相对于其近交亲本表型的增强。杂种优势在农业中广泛应用,其潜在机制尚不清楚。为了研究表型杂种优势的分子基础,研究人员在转录组中的表达方面搜索了数以万计的杂种优势基因。由于复合零假设和这些零假设下 p 值的不均匀分布,杂种优势的评估出现了困难。因此,我们开发了计数数据的通用分层模型和完全贝叶斯分析,其中高效的并行马尔可夫链蒙特卡罗算法减轻了计算负担。我们使用我们的方法来检测双杂交植物育种场景中的基因表达杂种优势,无论是在真实的 RNA-seq 玉米数据集中还是在模拟研究中。在模拟研究中,我们表明,当分析中假设的模型与用于模拟数据的模型相匹配时,我们的方法具有经过良好校准的后验概率和可信区间。尽管模型错误指定会对校准产生不利影响,但该方法仍然能够准确地对基因进行排序。最后,我们表明超参数后验极其狭窄,并且基于完全贝叶斯分析的后验均值的经验贝叶斯(eBayes)方法提供了相对于完全贝叶斯解决方案几乎等效的后验概率、可信区间和基因排名。如果可以获得准确的超参数估计,则该等价性证据为在 RNA-seq 数据分析中使用 eBayes 程序提供了支持。本文的补充材料可在线获取。
更新日期:2018-11-13
down
wechat
bug