当前位置: X-MOL 学术J. R. Stat. Soc. Ser. C Appl. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Bayesian model selection approach for identifying differentially expressed transcripts from RNA sequencing data.
The Journal of the Royal Statistical Society: Series C (Applied Statistics) ( IF 1.6 ) Pub Date : 2017-02-07 , DOI: 10.1111/rssc.12213
Panagiotis Papastamoulis 1 , Magnus Rattray 1
Affiliation  

Recent advances in molecular biology allow the quantification of the transcriptome and scoring transcripts as differentially or equally expressed between two biological conditions. Although these two tasks are closely linked, the available inference methods treat them separately: a primary model is used to estimate expression and its output is post processed by using a differential expression model. In the paper, both issues are simultaneously addressed by proposing the joint estimation of expression levels and differential expression: the unknown relative abundance of each transcript can either be equal or not between two conditions. A hierarchical Bayesian model builds on the BitSeq framework and the posterior distribution of transcript expression and differential expression is inferred by using Markov chain Monte Carlo sampling. It is shown that the model proposed enjoys conjugacy for fixed dimension variables; thus the full conditional distributions are analytically derived. Two samplers are constructed, a reversible jump Markov chain Monte Carlo sampler and a collapsed Gibbs sampler, and the latter is found to perform better. A cluster representation of the aligned reads to the transcriptome is introduced, allowing parallel estimation of the marginal posterior distribution of subsets of transcripts under reasonable computing time. Under a fixed prior probability of differential expression the clusterwise sampler has the same marginal posterior distributions as the raw sampler, but a more general prior structure is also employed. The algorithm proposed is benchmarked against alternative methods by using synthetic data sets and applied to real RNA sequencing data. Source code is available on line from https://github.com/mqbssppe/cjBitSeq.

中文翻译:

一种贝叶斯模型选择方法,用于从 RNA 测序数据中识别差异表达的转录本。

分子生物学的最新进展允许对转录组进行量化,并对转录本进行评分,使其在两种生物条件之间差异或相等地表达。尽管这两个任务密切相关,但可用的推理方法将它们分开处理:初级模型用于估计表达,其输出使用差异表达模型进行后处理。在本文中,通过提出表达水平和差异表达的联合估计同时解决了这两个问题:每个转录本的未知相对丰度可以在两个条件之间相等或不相等。分层贝叶斯模型建立在 BitSeq 框架上,并通过使用马尔可夫链蒙特卡罗采样推断转录本表达和差异表达的后验分布。结果表明,所提出的模型对于固定维变量具有共轭性;因此,完整的条件分布是通过分析得出的。构建了两个采样器,一个可逆跳马尔可夫链蒙特卡罗采样器和一个折叠吉布斯采样器,发现后者性能更好。引入了对转录组的对齐读取的集群表示,允许在合理的计算时间内并行估计转录子集的边缘后验分布。在差分表达的固定先验概率下,聚类采样器具有与原始采样器相同的边际后验分布,但也采用了更一般的先验结构。所提出的算法通过使用合成数据集以替代方法为基准,并应用于真实的 RNA 测序数据。
更新日期:2019-11-01
down
wechat
bug