Bayesian estimation of differential transcript usage from RNA-seq data

Panagiotis Papastamoulis; Magnus Rattray

doi:10.1515/sagmb-2017-0005

Open Access Published by De Gruyter November 1, 2017

Bayesian estimation of differential transcript usage from RNA-seq data

Panagiotis Papastamoulis and Magnus Rattray

From the journal Statistical Applications in Genetics and Molecular Biology

https://doi.org/10.1515/sagmb-2017-0005

Abstract

Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace’s approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.

Keywords: alternative splicing; false discovery rate; Laplace approximation; MCMC; within gene transcript expression

1 Introduction

High throughput sequencing of cDNA (RNA-seq) (Mortazavi et al., 2008) is an important tool to quantify transcript expression levels and to identify differences between different biological conditions. RNA-seq experiments produce a large number (millions) of short reads (nucleotide sequences) which are typically mapped to the genome or transcriptome. Expression quantification requires estimating the number of reads originating from each transcript in a given sample. Quantifying the transcriptome between different samples allows the identification of differentially expressed (DE) transcripts between them. However, certain difficulties complicate the inference procedure. In higher eukaryotes, most genes are spliced into alternative transcripts which share specific parts of their sequence (exons). Hence, a given short read typically aligns to different positions of the transcriptome and statistical models are often used to infer the origin probabilistically ((Trapnell et al., 2010, 2013; Li and Dewey, 2011; Nicolae et al., 2011; Glaus et al., 2012; Rossell et al., 2014; Hensman et al., 2015).

Differential transcript expression (DTE) refers to the event where the overall relative expression of a transcript changes between two conditions. In this case, θ_k refers to the relative expression of transcript k; k = 1, …, K, with respect to the whole set of transcripts, with θ_k ⩾ 0 and ∑k=1Kθk=1. On the contrary, differential transcript usage (DTU) refers to the event that the relative within gene abundance of a transcript changes between conditions. Consider a gene g = 1, …, G with K_g > 1 transcripts. Then, the relative within gene transcript abundance is defined as θk(g)=θk∑j∈gθj. Obviously, if a transcript belongs to a gene with K_g = 1 then it is always non-DTU. According to Gonzàlez-Porta et al. (2013) the dominant transcripts within a gene are likely to be the main contributors to the proteome and switching events between them is a common scenario of gene modification between conditions.

Figure 1 illustrates the differences between DTE and DTU, considering a set of three genes (shown in red, blue and green) consisting of 2, 2 and 3 transcripts. In the case of DTE (upper panel) the overall expression of transcripts 1, 2, 6 and 7 change: in particular transcripts 1 and 2 are up-regulated in condition A while trancripts 6 and 7 are up-regulated in condition B. In the lower panel of Figure 1 note that only transcripts 6 and 7 are DTE. However, also note that now the relative expression of these transcripts conditionally on the set of the same-gene transcripts (green color) is not the same between conditions. In general, DTU implies DTE but the reverse is not necessarily true.

Figure 1:

Differential transcript expression (up) and differential transcript usage (down).

In this paper we extend the use of two available methods in order to perform Bayesian inference for the problem of DTU. cjBitSeq (Papastamoulis and Rattray, 2017) was originally introduced as a Bayesian read-based model for DTE inference and here we modify it for the DTU problem. We also propose a Bayesian version of DRIMSeq (Nowicka and Robinson, 2016), a count-based approach originally introduced as a frequentist model for DTU inference. Genome-scale studies incorporate a large number of multiple tests, typically at the order of tens of thousands. A crucial issue under a multiple comparisons framework is the control of the False Discovery Rate (FDR), that is, the expected proportions of errors among the rejected hypotheses (Benjamini and Hochberg, 1995). According to a recent benchmarking study (Soneson et al., 2015), the ability of frequentist count-based methods to control the FDR is drastically improved by pre-filtering low-expressed transcripts. This remains true for the Bayesian version of the count-based method presented here (DRIMSeq). However it is not possible to incorporate such a strategy for read-based methods (cjBitSeq) where transcript expression levels are not known a priori. Therefore, under our Bayesian framework, we also propose the use of transformations of the raw posterior probabilities and filtering the output based on the notion of trust regions which are motivated from realistic scenarios of gene regulation (Gonzàlez-Porta et al., 2013).

The rest of the paper is organized as follows. In Section 2 we briefly describe existing methods. The proposed Bayesian models are presented in Section 3. More specifically, Section 3.1 reviews the cjBitSeq framework and also introduces the necessary prior modifications for the problem of DTU. The likelihood of the DRIMSeq model is presented in Section 3.2 and a Bayesian version is introduced next, along with a detailed description of the inference. Section 4 deals with FDR control procedures. In Section 5 we report our findings on synthetic data using the carefully designed simulation study of Soneson et al. (2015). In Section 5.1 we compare cjBitSeq and BayesDRIMSeq with respect to the decision rules of Section 4 using power versus achieved FDR plots. In Section 5.2 we benchmark these methods against existing ones and we also report more performance measures, such as ROC and precision/recall curves as well as comparisons in terms of run-time and memory requirements. A real RNA-seq dataset is analysed in Section 6. The manuscript concludes with a Discussion. A prior sensitivity analysis of BayesDRIMSeq as well as a comparison between alternative inputs of BayesDRIMSeq and DRIMSeq based on different quantification methods is provided in the Appendix.

2 Existing methods

cuffdiff

The cufflinks/cuffdiff ((Trapnell et al., 2010, 2013) pipeline estimates the expression of a set of transcripts and then performs various differential expression tests both on the transcript and gene level. DTU at the gene level is based on comparing the similarity of two distributions using the square root of the Jensen-Shannon divergence (Osterreicher and Vajda, 2003; Endres and Schindelin, 2003). Following Soneson et al. (2015), we used the gene-wise FDR estimates from the cds.diff output file of cuffdiff (version 2.2.1).

DEXSeq

DEXSeq (Anders et al., 2012) is the most popular method for inferring DTU. The genome is divided into disjoint parts of exons (counting bins) and a matrix of read counts into the counting bins is used as input. The default method for counting reads for this purpose is HTSeq (Anders et al., 2015). Given the estimated reads from HTSeq, a negative binomial generalized linear model is fit and DTU is inferred by testing whether the interaction term between conditions is different from zero.

DRIMSeq

This recent package (Nowicka and Robinson, 2016) implements a dirichlet-multinomial model in order to describe the variability between replicates. A likelihood ratio test is performed in order to compare a full model with distinct parameters per condition and a null model which assumes that the parameters are shared. The input is a matrix of counts per transcript. We applied this method using the following filtering criteria:

min_gene_expr = 1 (Minimal gene expression in cpm)
min_feature_prop = 0.01 (Minimal proportion for feature expression)
min_samps_gene_expr = 3 (Minimal number of samples where genes should be expressed)
min_samps_feature_prop = 3 (Minimal number of samples where features should be expressed)

edgeR

The function spliceVariants from the edgeR (Robinson et al., 2010) package can be used to identify genes showing evidence of splice variation using negative binomial generalized linear models. For each gene (containing at least two transcripts) a likelihood ratio test compares a model with an interaction term between each condition against a null model with no interaction term. The input corresponds to a matrix of counts per transcript.

limma

The function diffSplice from the limma (Ritchie et al., 2015) package also tests for DTU by fitting negative binomial generalized linear models and performing a likelihood ratio test at the difference of log-fold changes. The input corresponds to a matrix of counts per transcript.

3 New Bayesian approaches

cjBitSeq was originally applied to problem of inferring transcripts with DTE and here this model is modified for the problem of DTU. DRIMSeq is a frequentist-based approach for the problem of DTU and this model is now extended under a Bayesian framework. cjBitSeq is a read-based model, that is, the observed data is a matrix of alignments of each read to the transcriptome. On the other hand, DRIMSeq is a count-based model, which uses as input a matrix of (estimated) counts corresponding to the number of reads originating from each transcript. Both methods report an estimate of the posterior probability of DTU per gene. cjBitSeq performs collapsed Gibbs sampling on the space of latent states of each transcript, that is, a binary vector with 0 corresponding to equally expressed (EE) transcripts and 1 otherwise. Bayesian DRIMSeq estimates the Bayes factor between a DTU and a null model. Therefore, cjBitSeq also reports a posterior probability of DTU for each transcript which may be of interest for transcript-level analysis. In this study we focus our attention at the gene-level summaries as done in Soneson et al. (2015).

Both models take advantage of distributions with richer covariance structures compared to standard sampling schemes: in particular, the generalized Dirichlet distribution is arising as a full conditional distribution at the cjBitSeq model, while DRIMSeq is based on the Dirichlet-Multinomial distribution. The Generalized Dirichlet distribution allows for positive correlations between proportions, something that it is not the case for a standard Dirichlet model, and the Dirichlet-Multinomial distribution exhibits extra variation compared to a multinomial model. Interestingly, we note that both distributions were introduced by the same author (Mosimann, 1962; Connor and Mosimann, 1969).

3.1 cjBitSeq

Let x=(x1,…,xr), xi∈𝒳, i = 1, …, r, denote a sample of r short reads aligned to a given set of K transcripts. The sample space 𝒳 consists of all sequences of letters A, C, G, T. Assuming that reads are independent, the joint probability density function of the data is written as

(1)x|𝜽∼∏i=1r∑k=1Kθkfk(xi).

The number of components (K) is equal to the number of transcripts and it is considered as known since the transcriptome is given. The parameter vector 𝜽=(θ1,…,θK)∈𝒫K−1 denotes relative abundances, where

𝒫K−1:={pk⩾0,k=1,…,K−1:∑k=1K−1pk⩽1;pK:=1−∑k=1K−1pk}.

The component specific density f_k(⋅) corresponds to the probability of a read aligning at some position of transcript k, k = 1, …, K. Since we assume a known transcriptome, {fk}k=1K are known as well and they are computed according to the methodology described in Glaus et al. (2012), taking into account optional position and sequence-specific bias correction method.

Papastamoulis and Rattray (2017) proposed a Bayesian model selection approach for identifying differentially expressed transcripts from RNA-seq data. The methods builds upon the BitSeq model (Glaus et al., 2012; Papastamoulis et al., 2014; Hensman et al., 2015). Compared to other approaches, the main difference of cjBitSeq is that transcript expression and differential expression is jointly modelled. In contrast to other methods where the starting point of the DE analysis is a count matrix, the input of cjBitSeq is the matrix L containing alignment probabilities of each read to the transcriptome. According to Equation (1), the probability of read i aligning at transcript k is given by Lik=fk(xi) for i = 1, …, r and k = 1, …, K.

Assume that we have at hand two samples x:=(x1,…,xr) and y:=(y1,…,ys), with r and s denoting the number of (mapped) reads for sample x and y, respectively. Now, let θ_k and w_k denote the unknown relative abundance of transcript k = 1, …, K in sample x and y, respectively. Define the parameter vector of relative abundances as 𝜽=(θ1,…,θK−1;θK)∈𝒫K−1 and w=(w1,…,wK−1;wK)∈𝒫K−1. Under the standard BitSeq model the prior on the parameters 𝜽 and w would be a product of independent Dirichlet distributions. In this case the probability θ_k = w_k under the prior is zero and it is not straightforward to define non-DE transcripts. To model differential expression we would instead like to identify instances where transcript expression has not changed between samples. Therefore, we introduce a finite probability for the event θ_k = w_k. This leads us to define a new model with a non-independent prior for the parameters 𝜽 and w.

Definition 1 (State vector).

Let c:=(c1,…,cK)∈C, where C is the set defined by:

ck∈{0,1}, k = 1, …, K
c+:=∑k=1Kck≠1.

Then, for k=1, …, K let: {θk=wk,if ⁢ck=0θk≠wk,if ⁢ck=1. We will refer to vector c as the state vector of the model.

cjBitSeq was originally applied to the problem of DTE by introducing a cluster representation of aligned reads to transcripts. This clustering approach substantially reduces the dimensionality of the sampling space and makes the MCMC sampler converge to reasonable time. It is important to mention that clusters are defined under a data-driven algorithm, that is, by searching the alignments of each read and identifying groups of transcripts sharing reads.

Under the same approach, we would be able to infer clusters of transcripts with DTU. However, since in this work we focus on inference at the gene level, we impose the assumption that clusters are defined as the transcripts of each gene. Otherwise, in some instances it will not be straightforward to perform inference at the gene level, due to the possibility of clusters of transcripts merging multiple genes together. For example, we found that approximately 4.5% of mapped reads align to more than one gene in our simulation experiments of Section 5 using paired-end reads with length 101 base-pairs. In case that a read maps to more than one gene, we only keep the alignments corresponding to transcripts of the gene containing the best score for this specific read. Thus, the cjBitSeq algorithm is applied separately to each gene (consisting of at least two transcripts).

For the problem of DTU, cjBitSeq is applied under a modification in the prior distribution of DE per transcript. Under the Jeffreys’ prior, which is used in the default cjBitSeq setting, the probability of a gene consisting of DE transcripts is an increasing function of the number of transcripts. This prior is reasonable at a transcript-level analysis and it has been shown that it outperforms other choices. However, this choice introduces a prior bias to the case of DTU since genes with larger number of transcripts are assigned larger prior probability of DTU than genes with small number of transcripts. Therefore, now it is a priori assumed that the probability of no differential expression within a gene is equally weighted with the event that at least two transcripts exhibit DTU, that is, ℙ(c+=0)=0.5. An equal prior probability is assigned to the rest possible configurations. Thus, the prior distribution on the state vector is defined as:

(2)P(c)=P(c|c+≠1)={0.5,c+=00.52K−K−1,c+⩾2.

This modification is necessary in order to ensure that no prior bias is enforced at the gene-level which is the aim of the analysis in the DTU setup.

A graphical model of the cjBitSeq prior assumptions is shown in Figure 2. The binary state vector c = (c₁, …, c_K) defines differentially or equally expressed transcripts within each gene. The prior distribution of c is given by Equation (2), although in the general implementation of Papastamoulis and Rattray (2017) an extra level of hierarchy is imposed by the hyper-parameter π, shown in Figure 2. The parameters u and v are a-priori independent Dirichlet random variables. The dimension of u is equal to K, i.e. the number of transcripts for a given gene. On the other hand, v is a random variable with varying dimension, which is defined by the number of differentially expressed transcripts, that is, ∑k=1Kck. The parameters u and v along with an auxiliary parameter 𝝉 define via a suitable one-to-one transformation the actual transcript expression parameters 𝜽 and w. According to Theorem 1 of Papastamoulis and Rattray (2017), 𝜽 and w are marginally Dirichlet random variables, however they are not independent since the probability of the events {θk=wk;k=1,…,K} is positive. At the next level of hierarchy, the latent allocation variables 𝝃 and z define the transcript allocation of each read from sample x and y, respectively, through the equations P(ξi=k)=θk, independent for i = 1, …, r, and P(z_j = k) = w_k, independent for j = 1, …, s.

Figure 2:

Directed Acyclic Graph representation for the cjBitSeq model. Squares and circles represent unknown and observed/fixed quantities, respectively.

Papastamoulis and Rattray (2017) showed that the model is conjugate given c. But in order to update (c, v), a reversible-jump mechanism (Green, 1995; Richardson and Green, 1997; Papastamoulis and Iliopoulos, 2009) is required. However, this step can be avoided by analytical integration of (u, v). Thus, a collapsed Gibbs sampler (Geman and Geman, 1984; Gelfand and Smith, 1990; Liu, 1994; Liu et al., 1995) updates the latent allocation variables (𝝃 and z) of each read to its transcript of origin as well as the binary variables c_k of each transcript state (DE or EE). Let x−[i] denote the vector arising from x after excluding its i-th entry. A pseudo-code description of the collapsed Gibbs MCMC sampler is:

Update allocation variables for sample x: ξi|𝝃[−i],z,c,x,y, i = 1, …, r.
Update allocation variables for sample y: zj|𝝃,z[−j],c,x,y, j = 1, …, s.
Draw a random sample (without replacement) of indices (j₁, j₂) from {1,…,K} and update the block of state vector cj1,j2|c−[j1,j2],𝝃,z,x,y.
Update (𝜽,w,τ,u,v)|c,𝝃,z,x,y (optional).

Note that the update 4 is optional in the sense that it is not required by any of the previous steps, however one can include it in order to also obtain MCMC samples of the transcript expression parameters 𝜽 and w. For a detailed description of the conditional distributions involved in steps 1–4 (as well as the alternative RJMCMC sampler) see Papastamoulis and Rattray (2017).

According to our model, it is natural to call a gene as DE if at least two transcripts exhibit DTU. Hence, the posterior probability of DTU for a gene g is defined as

(3)pg=ℙ{c+>0|x,y},g=1,…,G,

and it is estimated by the corresponding ergodic average across the MCMC run (after burn-in).

3.2 BayesDRIMSeq

Let n = n_g denotes the total number of reads aligning to a gene g with k transcripts, g = 1, …, G. Assume that X=Xg=(X1,…,Xk) is the vector of reads originating from each transcript, according to an underlying vector 𝜽=𝜽g=(θ1,…,θk) of relative abundances which is unknown. A priori, a Dirichlet prior is imposed on 𝜽 and, given 𝜽, the observed reads are generated according to a multinomial distribution, that is,

𝜽∼𝒟(δ1,…,δk)X|𝜽∼Multinomial(n,𝜽)

Integrating out 𝜽, this model leads to the Dirichlet-Multinomial (Mosimann, 1962) distribution:

P(X=x)=(nx)Γ(δ+)Γ(n+δ+)∏j=1kΓ(δj+xj)Γ(δj),

where the first term in the product denotes the multinomial coefficient and δ+=∑k=1Kδk. We will write: X|n,𝜹∼𝒟ℳ(n,𝜹). It can be shown that

𝔼X=n𝝅

and

VarX={1+n−1δ++1}n{diag(𝝅)−𝝅𝝅′},

where 𝝅={δj/δ+;j=1,…,k−1} and diag(𝝅) denotes a diagonal matrix with diagonal entries equal to π₁, … ,π_k−1. Note that as δ+→∞ the variance-covariance matrix of the Dirichlet-multinomial distribution reduces to n{diag(𝝅)−𝝅𝝅′}, that is, the variance-covariance matrix of the multinomial distribution. In any other case extra variation is introduced compared to standard multinomial sampling, a well known property of the Dirichlet-multinomial distribution [see e.g. Neerchal and Morel (1998)].

Consider now that a matrix of (estimated) read counts is available for two different conditions, consisting of n₁ and n₂ replicates. Given two hyper-parameter vectors 𝜹1,𝜹2, let

Xi(g)|n1i,𝜹1∼𝒟ℳ(n1i,𝜹1),independent for i=1,…,n1Yj(g)|n2j,𝜹2∼𝒟ℳ(n2j,𝜹2),independent for j=1,…,n2,

where Xi(g), Yj(g) denote two independent vectors of (estimated) number of reads for the transcripts of gene g = 1, …, G for replicate i = 1, …, n₁ and j = 1, …, n₂ for the first and second condition, respectively. Obviously, n_1i and n_2j denote the total number of reads generated from gene g for the first and second condition for replicates i and j.

In this context, DTU inference is based on comparing the hyper-parameters of the Dirichlet-Multinomial distribution. Note that 𝜹₁ and 𝜹₂ is proportional to the average expression level of the specific set of transcripts. Typically, there are large differences in the scale of these parameters, thus their direct comparison does not reveal any evidence for DTU. For this reason, it is essential to reparametrize the model as follows:

(4)𝜹1=d1g1𝜹2=d2g2,

where d₁ > 0, d₂ > 0 and g1=(g11,…,g1k), g2=(g21,…,g2k), with ∑i=1kg1i=∑i=1kg2i=1 and g1i,g2i>0, i = 1, …, k.

In this case, DTU inference is based on comparing the null model:

ℳ0:g1=g2

versus the full model where

ℳ1:g1≠g2.

A likelihood ratio test is implemented in the DRIMSeq package for testing the hypothesis of the null versus the full model. In this work, we propose to compare the two models by applying approximate Bayesian model selection techniques. In particular, a priori it is assumed that

(5)di∼ℰ(λ),independent for i=1,2gi∼𝒟(1,…,1) independent for i=1,2,

and furthermore d_i and g_j are mutually independent.

In order to perform Bayesian model selection, the Bayes factor (Kass and Raftery, 1995) of the null against the full model is approximated using a two stage procedure. At first, the posterior distribution of each model is approximated using Laplace’s approximation ((Laplace, 1774, 1986), a well established practice for approximating posterior moments and posterior distributions (Tierney and Kadane, 1986; Tierney et al., 1989; Azevedo-Filho and Shachter, 1994; Raftery, 1996). Then, the logarithm of marginal likelihoods of ℳ₀ and ℳ₁ are estimated using independent samples from the posterior distribution via self-normalized sampling importance resampling (Gordon et al., 1993). Finally, the posterior probabilities p(ℳ0|x(g),y(g)), and p(ℳ1|x(g),y(g)) are estimated assuming equally weighted prior probabilities.

Denote by g₀ the common value of g₁, g₂ in model ℳ₀. Let u0=(g0,d1,d2)∈𝒰0, u1=(g1,g2,d1,d2)∈𝒰1 denote the parameters associated with models ℳ₀ and ℳ₁, respectively. Obviously, the underlying parameter spaces are defined as 𝒰0=𝒫Kg−1×(0,+∞)2 and 𝒰1=𝒫Kg−12×(0,+∞)2. The marginal likelihood of data under model ℳ_j, is defined as

f(x(g),y(g)|ℳj)=∫𝒰jf(x(g),y(g)|uj)f(uj|λ)duj,j=0,1.

According to the basic importance sampling identity, the marginal likelihood model can be evaluated using another density ϕ, which is absolutely continuous on 𝒰_j, as follows

f(x,y|ℳj)=∫𝒰jf(x(g),y(g)|uj)f(uj|λ)ϕ(uj)ϕ(uj)duj.

The minimum requirement for ϕ is to satisfy ϕ(uj)>0 whenever f(x(g),y(g)|uj)f(uj|λ)>0. Assume that a sample {u(i);i=1,…,n} is drawn from ϕ(⋅). Then, the importance sampling estimate of the marginal likelihood is

f^(x(g),y(g)|ℳj)=1n∑i=1nf(x(g),y(g)|uj(i))f(uj(i)|λ)ϕ(uj(i)),j=0,1.

The candidate distribution ϕ is the approximation of the posterior distribution according to the Laplace’s method. It is well known that basic importance sampling performs reasonably well in cases that the number of parameters is not too large. However, it can be drastically improved using sequential Monte Carlo methods, such as sampling importance resampling (Gordon et al., 1993; Liu and Chen, 1998). The R package LaplacesDemon (Statisticat and LLC., 2016) is used for this purpose.

Finally, the posterior probability of the DTU model is defined as

(6)pg=ℙ(ℳ1|x(g),y(g))∝f(x(g),y(g)|ℳ1)P(ℳ1),g=1,…,G,

by also assuming equally weighted prior probabilities, that is, P(ℳ1)=P(ℳ0)=0.5. Note that the Bayes Factor of the null against the full model is then given by

B01(g)=ℙ(ℳ0|x(g),y(g))ℙ(ℳ1|x(g),y(g))=f(x(g),y(g)|ℳ0)f(x(g),y(g)|ℳ1),g=1,…,G

since the prior odds ratio is equal to one.

In case that low expressed transcripts are included in the computation, the Laplace approximation faces many convergence problems. We have found that this problem can be alleviated by pre-filtering low expressed transcripts, as also pointed out by Soneson et al. (2015).

4 Bayesian FDR control for the problem of DTU

In this section we consider various decision rules in order to control the False Discovery Rate (FDR) (Benjamini and Hochberg, 1995; Storey, 2003; Müller et al., 2004, 2006). Decision rules (7) and (9) are taking into account the whole set of genes and make use of the raw and transformed posterior probabilities, respectively. Intuitively, the transformation of posterior probabilities prioritizes genes consisting of transcripts with large changes in their expression. Decision rules (8) and (10) are based on filtering the output of (7) and (9) according to a trust region.

A decision rule based on the raw gene-level posterior probabilities of DTU, as defined in Equations (3) and (6), is the following.

(7)d1g={1,p^g⩾1−α0,otherwise.

Note that for the problem of inferring DTE the decision rule (7) is the one used by Leng et al. (2013). However, the cjBitSeq model takes into account changes to any subset of transcripts within a gene, thus, (7) may identify a large number of genes consisting of relatively small changes in low expressed transcripts. A more conservative choice will focus our attention to the dominant transcripts, where more reads are available and potentially the results will be more robust.

Next we define a filtering of the output based on a “trust region.” Let i and j denote the estimated dominant transcripts in condition A and B, respectively. The trust region corresponds to the subset of genes where the relative ordering of estimated expression levels of dominant transcript switches, that is,

G0={g=1,…,G:(θ^i(g)−θ^j(g))(w^i(g)−w^j(g))<0}.

Switching events between dominant transcripts have been proposed as a major source of DTU in real RNA-seq data (Gonzàlez-Porta et al., 2013).

Note that in the previous expression we used the notation of transcript expression levels according to cjBitSeq. For BayesDRIMSeq 𝜽 and w should be replaced by g₁ and g₂, respectively. The decision rule which corresponds to filtering (7) according to G₀ is the following:

(8)d2g={1,p^g⩾1−α and g∈G00,otherwise.

Note that decision rules d₁ and d₂ are solely based on the posterior probabilities of gene DTU and the trust region, respectively. However, it makes sense to also take into account additional information, such as the magnitude of the change of the within gene relative transcript expression, which is a by-product of our algorithm.

In order to clarify this, consider the following example. Assume that genes g₁ and g₂ both consist of two transcripts. For g₁, let θ1(g1)=0.1, θ2(g1)=0.9 and w1(g1)=0.9, w2(g1)=0.1. For g₂, let θ1(g2)=0.4, θ2(g2)=0.6 and w1(g2)=0.6, w2(g2)=0.4. Furthermore, assume that the posterior evidence of DE is the same for both genes, that is, p^g1=p^g2=p. In the case that the posterior probability p is sufficiently large, genes g₁ and g₂ will be given the same importance in our discovery list. Note however that for gene g₁ the absolute change in relative expression is 4 times larger than for gene g₂. Ideally, we would like our discovery list to rank higher gene g₁ than gene g₂. This is achieved using the following FDR control procedure.

Consider any (Bayesian) method that for each gene yields an estimate of the posterior probability of DTU per gene p_g, g = 1, …, G.

For a given permutation 𝝉=(τ1,τ2,…,τG) of {1,2,…,G} and let qg=pτg, g = 1, …, G.
Define: rg=∑j=1g1−qjg, g = 1, …, G.
For 0 < α < 1, consider the decision rule:
(9)d3g={1,1⩽g⩽g*0,g*+1⩽g⩽G
where g*:=max{g=1,…,G:rg⩽α}.
𝔼^(FDR|data)=∑j=1g*1−qjg*⩽α

Here we mention that in the original implementation of cjBitSeq for the DTE problem, the permutation τ was defined as the one that orders the posterior probabilities of transcript DE in decreasing order.

The permutation that takes into account the previously described concept of magnitude change is defined as follows. Let ρg=max|θ^k(g)−w^k(g)|,k=1,…,Kg, where θ^k(g) and w^k(g) denote the posterior mean estimates of within gene transcript expression for a given transcript k of gene g. Consider the permutation 𝝉=(τ1,τ2…,τG) that orders the set {ρg;g=1,…,G} in decreasing order, that is:

ρτ1⩾ρτ2⩾…⩾ρτG.

Finally, we combine decision rule d₃ with the trust region G₀ to obtain our final decision rule, that is,

(10)d4g={1,1⩽g⩽g* and g∈G00,otherwise.

5 Simulation study

In order to assess the performance of the proposed methods and decision rules as well as to compare against existing models, a set of simulation studies is used. Instead of setting up our own simulation scenarios, we followed the pipeline introduced in the recent study of Soneson et al. (2015), where a large number of count-based method is being benchmarked. Synthetic RNA-seq reads are generated from the Drosophila Melanogaster and Homo Sapiens transcriptomes using the RSEM-simulator (Li and Dewey, 2011). The model parameters for RSEM-simulator were estimated from real datasets using a Negative Binomial model described in Soneson and Delorenzi (2013). The transcriptomes of these two organisms exhibit strong differences as illustrated in Figure 3. The average number of transcripts per gene is considerably smaller for fruit fly, however the transcripts are longer than for human [see also Supplementary Table 1 of Soneson et al. (2015)].

Figure 3:

Frequencies (in log scale) of number of annotated transcripts per gene for drosophila (up) and human (down). The total number of genes and transcripts is 13937 and 26951 for drosophila and 20410 and 145342 for human, respectively.

Following Soneson et al. (2015), for each organism we simulated 3 replicates per condition. Each replicate consists of 25 million paired-end reads with length 101 base-pairs. Differential transcript usage was introduced for 1000 genes, by reversing the relative abundance of the two most abundant transcripts in one of the two conditions. The total number of reads for each transcript may or may not be equal across conditions. If the total number of reads generated from a gene is constant, no gene-level differential expression is evident. For the drosophila reads no gene-level differential expression was introduced. For human reads both cases are considered. Finally, the simulated reads are mapped to the genome or transcriptome with Tophat2 (Trapnell et al., 2009) and Bowtie2 (Langmead et al., 2009), respectively. Cufflinks and HTSeq used the alignment files produced by Tophat2, while BitSeqVB and cjBitSeq use the alignment produced from Bowtie2, allowing a maximum of 100 hits per read. The count matrix used as input to DEXSeq is estimated using the default HTSeq method, while BitSeqVB is used for input to edgeR, limma, DRIMSeq and BayesDRIMSeq.

5.1 Comparison of Bayesian decision rules

Figure 4 displays the power versus achieved FDR using the decision rules d_k; k = 1, 2, 3, 4 for the three simulated datasets. Each rule was evaluated at four typical values of expected FDR levels, α = 0.01, 0.025, 0.05, 0.1, which are shown as dashed vertical lines. The plotted points correspond to the achieved FDR (x axis) and the proportion of true discoveries (y axis). The ability of each decision rule to control the FDR depends on the distance of each point from the corresponding vertical line: the closer, the better. On the other hand, a decision rule with higher y values is more powerful.

Figure 4:

Power versus achieved FDR plot using the decision rules d₁, d₂, d₃ and d₄ for cjBitSeq (1st row) and BayesDRIMSeq with (second row) and without (third row) isoform pre-filtering on the simulated data. The vertical dashed lines show the expected FDR level (0.01, 0.025, 0.05, 0.1). (A) Drosophila, (B) human without DTE and (C) human with DTE.

For cjBitSeq (upper panel) we conclude that the trust-region adjusted rules d₂ and d₄ achieve lower FDRs which are quite close to the expected values. However, note that d₄ yields better power compared to d₂, especially for the human datasets. BayesDRIMSeq is shown in middle and lower panel of 4. At the second panel of Figure 4 we have applied BayesDRIMSeq by filtering out transcripts with average number of reads less than 20. The results corresponding to the full set of transcripts (no pre-filtering) are shown at the lower panel of 4. We conclude that isoform pre-filtering is essential in order to achieve reasonable control of FDR in the case of human data. Note also that under isoform pre-filtering the trust region does not have a high impact on BayesDRIMSeq.

5.2 Comparison against existing methods

For cuffdiff, DRIMSeq, edgeR and limma we use the gene-level p-values at the ROC and precision/recall plots and the adjusted q-values at the power versus achieved FDR plot. However, dexSeq reports only the adjusted q-values, hence this method is not shown at ROC and precision/recall curves. Note that for all these methods, the adjusted q-values correspond to the Benjamini and Hochberg (1995) FDR control procedure. For cjBitSeq we used the raw FDR rate (9) at the ROC and precision/recall curves and the adjusted FDR (10) at the power versus achieved FDR plots. For BayesDRIMSeq we used the raw FDR rate (9) at all plots, after pre-filtering isoforms with an average number of reads less than 20.

The performance measures of the evaluated methods are shown in Figure 5. Comparing results for the two organisms, it is clear that edgeR, limma, dexSeq and frequentist DRIMSeq exhibit large differences in their ability to control the FDR. In particular, these methods exhibit significantly larger False Discovery rates for the human datasets compared to drosophila. On the other hand, cjBitSeq and BayesDRIMSeq are able to produce consistent results in all cases, being able at the same time to control the FDR within the 0, 0.1 area.

Figure 5:

Performance measures for drosophila (1st row), human without DTE (2nd row) and human with DTE (3rd row). (A) Power versus achieved FDR plot. The vertical dashed lines show the expected FDR level (0.01, 0.025,0.05,0.1). (B) ROC curve. (C) Precision/recall curve.

More specifically, for the drosophila example observe that cjBitSeq exhibits smaller achieved FDR rate and larger True Positive Rate compared to DRIMSeq, dexSeq and edgeR. BayesDRIMSeq achieves even smaller FDR rates but the number of True Positives is reduced compared to cjBitSeq. For the human examples we conclude that cjBitSeq exhibit almost similar performance in terms of FDR control, however the former is able to discover a larger number of DTU genes in both cases. DRIMSeq and dexSeq achieve FDR rates between (0.12, 0.30) but DRIMSeq also achieves larger True Positive Rates compared to dexSeq. Cuffdiff exhibits an almost perfect control of the FDR, at the cost of substantially reduced power. The ROC and precision/recall curves, shown at Figure 5(B) and (C) respectively, suggest that cjBitSeq and DRIMSeq are consistently ranked higher than other methods. Overall, we conclude that cjBitSeq outperforms all other methods.

The run-time per method is illustrated in Figure 6, with respect to the maximum amount of virtual memory used by each process. For the counting-based methods, the main computational burden of the two-stage pipeline is due to the first stage (that is, either HTSeq or BitSeqVB). DRIMSeq, edgeR, limma which used BitSeqVB as input exhibit nearly identical computing performance so only DRIMSeq is shown. Compared to the counting-based methods, cjBitSeq requires longer computing times, which should be expected given that cjBitSeq performs MCMC sampling on the space of all possible configurations of each transcript using as input the read alignments. However, note that cjBitSeq is quite efficient with respect to the memory used and that both memory and computing time vigorously scale with the number of available cores. Therefore, it is suggested to run cjBitSeq using at least 8 cores, since the memory requirements stay within reasonable levels. Finally, we mention that isoform pre-filtering is also essential for the computing time of BayesDRIMSeq. In case where no filtering takes place, the wallclock time is increased almost 2.5 times for drosophila and 4.3 times for the human datasets.

Figure 6:

Wall clock runtime versus maximum value (in log-scale) of virtual memory used. The number of cores used by each process is shown in parenthesis. For each dataset the total number of reads is equal to 150 millions.

6 Adenocarcinoma dataset

In this section we benchmark the new Bayesian methods against DRIMSeq using real RNA-seq data from human lung normal and adenocarcinoma samples from six Korean female nonsmoking patients (Kim et al., 2013). The data corresponds to samples from GSM927308 to GSM927319 and was downloaded from NCBI’s Gene Expression Omnibus (GEO) under the accession number GSE37764: SRR493937, SRR493939, SRR493941, SRR493943, SRR493945, SRR493947, SRR493949, SRR493951, SRR493953, SRR493955, SRR493957, SRR493959.

The data consist of paired-end reads with length equal to 78 base pairs which were mapped to the reference transcriptome using Bowtie2. The overall alignment rates and the total number of mapped reads range between (70%, 85%) and (22 × 10⁶, 30 × 10⁶), respectively. Next, BitSeq was used in order to calculate the matrix of alignment probabilities (as input to cjBitSeq) as well as to obtain a matrix of estimated counts per transcript (as input to DRIMSeq and BayesDRIMSeq).

Following Nowicka and Robinson (2016), we benchmark our methods using two comparisons: (a) a two-group comparison of 6 normal versus 6 cancer samples and (b) “mock” comparisons where 3 versus 3 samples from the normal condition are compared. For the latter scenario the expectation is to detect no DTU since replicates of the same condition are compared, although the biological variation between the replicates of the normal condition is high (as noted by Nowicka and Robinson, 2016). The results are displayed in Figure 7, using different cutoff values for controlling the FDR. For the 6 normal versus 6 cancer samples comparison (Figure 7A), we conclude that all decision rules contain a large amount of genes which overlap with DRIMSeq (green colored regions), especially for the trust-region adjusted rules d₂ and d₄. For the “mock” comparison (Figure 7B), at first note that a smaller number of DTU genes is inferred. Second, observe that the decision rule d₄ is capable of substantially reducing the number of false discoveries compared to DRIMSeq and that this number is almost zero when using α = 0.01.

Figure 7:

Inferred number of genes with DTU (n) at level α ∈ {0.01, 0.025, 0.05} for the comparison of 6 control and 6 tumor samples and null comparisons of 3 versus 3 control samples. For the null comparisons no differential splicing is expected. For the Bayesian methods cjBitSeq and BayesDRIMSeq all 4 decision rules are used. Green color corresponds to the number of DTU genes detected by each method that overlap with DRIMSeq and red corresponds to the opposite case. (A) 6 normal versus 6 cancer samples. (B) 3 normal versus 3 normal samples.

7 Discussion

In this study we exemplified the use of Bayesian methods for inferring genes with differential transcript usage. For this purpose two previously introduced models were modified and extended: cjBitSeq and a Bayesian version of DRIMSeq. After defining proper decision rules we concluded that both methods exhibit superior or comparable performance with other methods. This was achieved by using the decision rule defined in Equation (9), shown in the ROC and precision-recall curves. According to (9), the whole sequence of posterior probabilities is transformed with respect to the ordering of the magnitude change of relative expression between conditions. For the read-based method (cjBitSeq) FDR control is improved when the decision rule is combined with a trust region. For the count-based method (BayesDRIMSeq) FDR control is mainly affected by the filtering of low-expressed transcripts, as previously reported under a frequentist context by Soneson et al. (2015). BayesDRIMSeq exhibits slightly better FDR control than cjBitSeq for the drosophila dataset, however this effect is not so evident for the human datasets. In all cases cjBitSeq is more powerful than BayesDRIMSeq, but at the cost of increased computing time.

Regarding the analysis of real RNA-seq data, we compared our findings to DRIMSeq. We reported results based on a comparison of two different conditions, as well as “mock” comparisons of replicates within the same condition where no evidence of differential expression is expected. We concluded that our DTU lists contain a large number of genes also detected by DRIMSeq. Moreover, using conservative decision rules like d₄ we are able to substantially reduce the number of false discoveries when performing comparisons within the same condition.

The methods are available at https://github.com/mqbssppe/cjBitSeq (cjBitSeq) and https://github.com/mqbssppe/BayesDRIMSeq (BayesDRIMSeq). The source code for generating the simulated datasets of Soneson et al. (2015) is available from https://github.com/markrobinsonuzh/diff_splice_paper.

Acknowledgement

The research was supported by MRC award MR/M02010X/1, BBSRC award BB/J009415/1 and EU FP7 project RADIANT (grant 305626). The authors would like to acknowledge the assistance given by IT Services and the use of the Computational Shared Facility at The University of Manchester. Regarding BayesDRIMSeq and replication of simulations, helpful discussions with Mark Robinson, Malgorzata Nowicka and Charlotte Soneson (Institute of Molecular Life Sciences, University of Zurich) are gratefully acknowledged.

Appendix A: Prior sensitivity of BayesDRIMSeq

According to Equation (5), the prior assumptions of BayesDRIMSeq are depending on the fixed hyperparameter λ. Figure 8 displays the power versus achieved FDR curves based on the decision d₃ as a function of λ ∈ {0.01, 0.1, 0.2, …, 1} (after isoform pre-filtering). We conclude that the value λ = 0.5 offers, perhaps, the best trade-off between power and FDR control. In particular, we note that values smaller than 0.5 tend to have small power and, on the other hand, values larger than 0.5 have larger rates of False Discoveries. All results presented in the main paper correspond to λ = 0.5.

Figure 8:

Prior sensitivity of BayesDRIMSeq with respect to λ.

Appendix B: Using Kallisto counts

In the main text we used BitSeqVB count estimates as input to DRIMSeq and BayesDRIMSeq. According to the recent study of Hensman et al. (2015), BitSeqVB is ranked as one of the most accurate methods for estimating transcript expression levels. Since there is a variety of alternative methods for this purpose, we compare the performance when Kallisto (Bray et al., 2016) counts are being used as input. As shown in Figure 9, we conclude that in drosophila data both BayesDRIMSeq and DRIMSeq perform better when BitSeqVB counts are used. However there is no clear ordering in the human datasets: in both cases BitSeqVB counts correspond to increased power but at the cost of slightly worse FDR calibration. Finally, ROC and precision-recall curves suggest that BitSeqVB leads to slightly increased performance for both methods.

Figure 9:

Comparison of DRIMSeq and BayesDRIMSeq using BitSeq and Kallisto counts for drosophila (first row) and human data (second and third row).

References

Anders, S., P. T. Pyl, and W. Huber (2015): “HTSeq—a python framework to work with high-throughput sequencing data,” Bioinformatics, 31, 166–169.10.1093/bioinformatics/btu638Search in Google Scholar PubMed PubMed Central

Anders, S., A. Reyes, and W. Huber (2012): “Detecting differential usage of exons from RNA-seq data,” Genome Res., 22, 2008–2017.10.1101/gr.133744.111Search in Google Scholar PubMed PubMed Central

Azevedo-Filho, A. and R. D. Shachter (1994): “Laplace’s method approximations for probabilistic inference in belief networks with continuous variables,” in Proceedings of the tenth international conference on uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., Burlington, MA, 28–36.Search in Google Scholar

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. Ser. B Methodol., 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar

Bray, N., H. Pimentel, P. Melsted and L. Pachter (2016): “Near-optimal RNA-Seq quantification,” Nat. Biotechnol., 34, 525–527.10.1038/nbt.3519Search in Google Scholar PubMed

Connor, R. J. and J. E. Mosimann (1969): “Concepts of independence for proportions with a generalization of the Dirichlet distribution,” J. Am. Stat. Assoc., 64, 194–206.10.1080/01621459.1969.10500963Search in Google Scholar

Endres, D. and J. Schindelin (2003): “A new metric for probability distributions,” Inf. Theory IEEE Trans., 49, 1858–1860.10.1109/TIT.2003.813506Search in Google Scholar

Gelfand, A. and A. Smith (1990): “Sampling-based approaches to calculating marginal densities,” J. Am. Stat. Assoc., 85, 398–409.10.1080/01621459.1990.10476213Search in Google Scholar

Geman, S. and D. Geman (1984): “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6, 721–741.10.1016/B978-0-08-051581-6.50057-XSearch in Google Scholar

Glaus, P., A. Honkela and M. Rattray (2012): “Identifying differentially expressed transcripts from RNA-Seq data with biological variation,” Bioinformatics, 28, 1721–1728.10.1093/bioinformatics/bts260Search in Google Scholar PubMed PubMed Central

Gonzàlez-Porta, M., A. Frankish, J. Rung, J. Harrow and A. Brazma (2013): “Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene,” Genome Biol., 14, R70.10.1186/gb-2013-14-7-r70Search in Google Scholar PubMed PubMed Central

Gordon, N. J., D. J. Salmond and A. F. Smith (1993): “Novel approach to nonlinear/non-Gaussian Bayesian state estimation,” in IEE Proceedings F (Radar and Signal Processing), volume 140, IET, 107–113.10.1049/ip-f-2.1993.0015Search in Google Scholar

Green, P. J. (1995): “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination,” Biometrika, 82, 711–732.10.1093/biomet/82.4.711Search in Google Scholar

Hensman, J., P. Papastamoulis, P. Glaus, A. Honkela and M. Rattray (2015): “Fast and accurate approximate inference of transcript expression from RNA-seq data,” Bioinformatics, 31, 3881–3889.10.1093/bioinformatics/btv483Search in Google Scholar PubMed PubMed Central

Kass, R. E. and A. E. Raftery (1995): “Bayes factors,” J. Am. Stat. Assoc., 90, 773–795.10.1080/01621459.1995.10476572Search in Google Scholar

Kim, S. C., Y. Jung, J. Park, S. Cho, C. Seo, J. Kim, P. Kim, J. Park, J. Seo, J. Kim and S. Park (2013): “A high-dimensional, deep-sequencing study of lung adenocarcinoma in female never-smokers,” PLoS One, 8, e55596.10.1371/journal.pone.0055596Search in Google Scholar PubMed PubMed Central

Langmead, B., C. Trapnell, M. Pop and S. Salzberg (2009): “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biol., 10, R25.10.1186/gb-2009-10-3-r25Search in Google Scholar PubMed PubMed Central

Laplace, P. S. (1774): “Memoire sur la probabilite de causes par les evenemens,” Memoires de Mathematique et de Physique, Presentes a l’Academy Royale des Sciences, par divers Savans & lus dans ses Assemblees, Tome Sixieme, 621–656.Search in Google Scholar

Laplace, P. S. (1986): “Memoir on the probability of the causes of events (translated by S.M. Stigler, University of Chicago),” Stat. Sci., 1, 364–378.10.1214/ss/1177013621Search in Google Scholar

Leng, N., J. A. Dawson, J. A. Thomson, V. Ruotti, A. I. Rissman, B. M. Smits, J. D. Haag, M. N. Gould, R. M. Stewart and C. Kendziorski (2013): “EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments,” Bioinformatics, 29, 1035–1043.10.1093/bioinformatics/btt087Search in Google Scholar PubMed PubMed Central

Li, B. and C. N. Dewey (2011): “RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome,” BMC Bioinf., 12, 323.10.1186/1471-2105-12-323Search in Google Scholar PubMed PubMed Central

Liu, J. S. (1994): “The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem,” J. Am. Stat. Assoc., 89, 958–966.10.1080/01621459.1994.10476829Search in Google Scholar

Liu, J. S. and R. Chen (1998): “Sequential Monte Carlo methods for dynamic systems,” J. Am. Stat. Assoc., 93, 1032–1044.10.1080/01621459.1998.10473765Search in Google Scholar

Liu, J. S., W. H. Wong and A. Kong (1995): “Covariance structure and convergence rate of the Gibbs sampler with various scans,” J. R. Stat. Soc. Ser. B Methodol., 57, 157–169.10.1111/j.2517-6161.1995.tb02021.xSearch in Google Scholar

Mortazavi, A., B. Williams, K. McCue, L. Schaeffer and B. Wold (2008): “Mapping and quantifying mammalian transcriptomes by RNA-Seq,” Nat. Methods, 5, 621–628.10.1038/nmeth.1226Search in Google Scholar PubMed

Mosimann, J. E. (1962): “On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions,” Biometrika, 49, 65–82.10.1093/biomet/49.1-2.65Search in Google Scholar

Müller, P., G. Parmigiani and K. Rice (2006): “FDR and Bayesian multiple comparisons rules,” Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics.Search in Google Scholar

Müller, P., G. Parmigiani, C. Robert, and J. Rousseau (2004): “Optimal sample size for multiple testing,” Journal of the American Statistical Association, 99, 990–1001.10.1198/016214504000001646Search in Google Scholar

Neerchal, N. K. and J. G. Morel (1998): “Large cluster results for two parametric multinomial extra variation models,” J. Am. Stat. Assoc., 93, 1078–1087.10.1080/01621459.1998.10473769Search in Google Scholar

Nicolae, M., S. Mangul, I. Mandoiu and A. Zelikovsky (2011): “Estimation of alternative splicing isoform frequencies from RNA-seq data,” Algorithms Mol. Biol., 6, 9.10.1186/1748-7188-6-9Search in Google Scholar PubMed PubMed Central

Nowicka, M. and M. Robinson (2016): “DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics,” F1000Research, 5, 1356.10.12688/f1000research.8900.2Search in Google Scholar PubMed PubMed Central

Osterreicher, F. and I. Vajda (2003): “A new class of metric divergences on probability spaces and its applicability in statistics,” Ann. Inst. Stat. Math., 55, 639–653.10.1007/BF02517812Search in Google Scholar

Papastamoulis, P., J. Hensman, P. Glaus and M. Rattray (2014): “Improved variational Bayes inference for transcript expression estimation,” Stat. Appl. Genet. Mol. Biol., 13, 213–216.10.1515/sagmb-2013-0054Search in Google Scholar PubMed

Papastamoulis, P. and G. Iliopoulos (2009): “Reversible jump mcmc in mixtures of normal distributions with the same component means,” Comput. Stat. Data Anal., 53, 900–911.10.1016/j.csda.2008.10.022Search in Google Scholar

Papastamoulis, P. and M. Rattray (2017): “A Bayesian model selection approach for identifying differentially expressed transcripts from RNA sequencing data,” J. R. Stat. Soc. Ser. C Appl. Stat., doi:10.1111/rssc.12213.10.1111/rssc.12213Search in Google Scholar PubMed PubMed Central

Raftery, A. E. (1996): “Approximate Bayes factors and accounting for model uncertainty in generalised linear models,” Biometrika, 83, 251–266.10.1093/biomet/83.2.251Search in Google Scholar

Richardson, S. and P. J. Green (1997): “On Bayesian analysis of mixtures with an unknown number of components,” J. R. Stat. Soc. Ser. B, 59, 731–758.10.1111/1467-9868.00095Search in Google Scholar

Ritchie, M. E., B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi and G. K. Smyth (2015): “limma powers differential expression analyses for RNA-sequencing and microarray studies,” Nucleic Acids Res., 43, e47.10.1093/nar/gkv007Search in Google Scholar PubMed PubMed Central

Robinson, M., D. McCarthy and G. Smyth (2010): “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data,” Bioinformatics, 26, 139–140.10.1093/bioinformatics/btp616Search in Google Scholar PubMed PubMed Central

Rossell, D., S.-O. C. Attolini, M. Kroiss and A. Stocker (2014): “Quantifying alternative splicing from paired-end RNA-sequencing data,” Ann. Appl. Stat., 8, 309–330.10.1214/13-AOAS687Search in Google Scholar PubMed

Soneson, C. and M. Delorenzi (2013): “A comparison of methods for differential expression analysis of RNA-seq data,” BMC Bioinf., 14, 91.10.1186/1471-2105-14-91Search in Google Scholar PubMed PubMed Central

Soneson, C., K. L. Matthes, M. Nowicka, C. W. Law and M. D. Robinson (2015): “Differential transcript usage from RNA-seq data: isoform pre-filtering improves performance of count-based methods,” Genome Biol., 17, 12.10.1101/025387Search in Google Scholar

Statisticat and LLC. (2016): LaplacesDemon: complete environment for Bayesian inference. URL https://CRAN.R-project.org/package=LaplacesDemon, R package version 16.0.1.Search in Google Scholar

Storey, J. D. (2003): “The positive false discovery rate: A Bayesian interpretation and the q-value,” Ann. Stat., 31, 2013–2035.10.1214/aos/1074290335Search in Google Scholar

Tierney, L. and J. B. Kadane (1986): “Accurate approximations for posterior moments and marginal densities,” J. Am. Stat. Assoc., 81, 82–86.10.1080/01621459.1986.10478240Search in Google Scholar

Tierney, L., R. E. Kass and J. B. Kadane (1989): “Fully exponential Laplace approximations to expectations and variances of nonpositive functions,” J. Am. Stat. Assoc., 84, 710–716.10.1080/01621459.1989.10478824Search in Google Scholar

Trapnell, C., D. G. Hendrickson, M. Sauvageau, L. Goff, J. L. Rinn and L. Pachter (2013): “Differential analysis of gene regulation at transcript resolution with RNA-seq,” Nat. Biotechnol., 31, 46–53.10.1038/nbt.2450Search in Google Scholar PubMed PubMed Central

Trapnell, C., L. Pachter and S. Salzberg (2009): “TopHat: discovering splice junctions with RNA-Seq,” Bioinformatics, 25, 1105–1111.10.1093/bioinformatics/btp120Search in Google Scholar PubMed PubMed Central

Trapnell, C., B. A. Williams, G. Pertea, A. Mortazavi, G. Kwan, M. J. van Baren, S. L. Salzberg, B. J. Wold and L. Pachter (2010): “Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation,” Nat. Biotechnol., 28, 511–515.10.1038/nbt.1621Search in Google Scholar PubMed PubMed Central

Published Online: 2017-11-1

Published in Print: 2017-11-27

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Bayesian estimation of differential transcript usage from RNA-seq data

Abstract

1 Introduction

2 Existing methods

3 New Bayesian approaches

3.1 cjBitSeq

Definition 1 (State vector).

3.2 BayesDRIMSeq

4 Bayesian FDR control for the problem of DTU

5 Simulation study

5.1 Comparison of Bayesian decision rules

5.2 Comparison against existing methods

6 Adenocarcinoma dataset

7 Discussion

Acknowledgement

Appendix A: Prior sensitivity of BayesDRIMSeq

Appendix B: Using Kallisto counts

References

Journal and Issue

Articles in the same Issue