当前位置: X-MOL 学术Stat. Appl. Genet. Molecul. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model.
Statistical Applications in Genetics and Molecular Biology ( IF 0.8 ) Pub Date : 2019-01-22 , DOI: 10.1515/sagmb-2018-0021
Xiaohong Li 1, 2 , Dongfeng Wu 1 , Nigel G F Cooper 2 , Shesh N Rai 1
Affiliation  

High throughput RNA sequencing (RNA-seq) technology is increasingly used in disease-related biomarker studies. A negative binomial distribution has become the popular choice for modeling read counts of genes in RNA-seq data due to over-dispersed read counts. In this study, we propose two explicit sample size calculation methods for RNA-seq data using a negative binomial regression model. To derive these new sample size formulas, the common dispersion parameter and the size factor as an offset via a natural logarithm link function are incorporated. A two-sided Wald test statistic derived from the coefficient parameter is used for testing a single gene at a nominal significance level 0.05 and multiple genes at a false discovery rate 0.05. The variance for the Wald test is computed from the variance-covariance matrix with the parameters estimated from the maximum likelihood estimates under the unrestricted and constrained scenarios. The performance and a side-by-side comparison of our new formulas with three existing methods with a Wald test, a likelihood ratio test or an exact test are evaluated via simulation studies. Since other methods are much computationally extensive, we recommend our M1 method for quick and direct estimation of sample sizes in an experimental design. Finally, we illustrate sample sizes estimation using an existing breast cancer RNA-seq data.

中文翻译:

使用负二项式回归模型进行RNA-seq数据差异表达分析的样本量计算。

高通量RNA测序(RNA-seq)技术越来越用于疾病相关的生物标记研究中。由于读数分散度过高,负二项式分布已成为模拟RNA-seq数据中基因读数计数的流行选择。在这项研究中,我们提出了两种使用负二项式回归模型的RNA-seq数据的显式样本量计算方法。为了导出这些新的样本大小公式,通过自然对数链接函数合并了公共色散参数和大小因子作为偏移量。从系数参数导出的双向Wald检验统计量用于以名义显着性水平0.05检验单个基因,以错误发现率0.05检验多个基因。Wald检验的方差是根据方差-协方差矩阵计算得出的,其中参数是在不受限制和受限的情况下根据最大似然估计估算的参数。通过模拟研究评估了我们的新公式与三种现有方法(具有Wald检验,似然比检验或精确检验)的性能以及并排比较。由于其他方法的计算量很大,因此我们建议使用M1方法快速而直接地估算实验设计中的样本量。最后,我们说明了使用现有乳腺癌RNA-seq数据进行的样本量估计。通过模拟研究评估似然比检验或精确检验。由于其他方法的计算量很大,因此我们建议使用M1方法快速而直接地估算实验设计中的样本量。最后,我们说明了使用现有乳腺癌RNA-seq数据进行的样本量估计。通过模拟研究评估似然比检验或精确检验。由于其他方法的计算量很大,因此我们建议使用M1方法快速而直接地估算实验设计中的样本量。最后,我们说明了使用现有乳腺癌RNA-seq数据进行的样本量估计。
更新日期:2019-11-01
down
wechat
bug