当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Power analysis for RNA-Seq differential expression studies using generalized linear mixed effects models.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-05-19 , DOI: 10.1186/s12859-020-3541-7
Lianbo Yu 1 , Soledad Fernandez 1 , Guy Brock 1
Affiliation  

BACKGROUND Power analysis becomes an inevitable step in experimental design of current biomedical research. Complex designs allowing diverse correlation structures are commonly used in RNA-Seq experiments. However, the field currently lacks statistical methods to calculate sample size and estimate power for RNA-Seq differential expression studies using such designs. To fill the gap, simulation based methods have a great advantage by providing numerical solutions, since theoretical distributions of test statistics are typically unavailable for such designs. RESULTS In this paper, we propose a novel simulation based procedure for power estimation of differential expression with the employment of generalized linear mixed effects models for correlated expression data. We also propose a new procedure for power estimation of differential expression with the use of a bivariate negative binomial distribution for paired designs. We compare the performance of both the likelihood ratio test and Wald test under a variety of simulation scenarios with the proposed procedures. The simulated distribution was used to estimate the null distribution of test statistics in order to achieve the desired false positive control and was compared to the asymptotic Chi-square distribution. In addition, we applied the procedure for paired designs to the TCGA breast cancer data set. CONCLUSIONS In summary, we provide a framework for power estimation of RNA-Seq differential expression under complex experimental designs. Simulation results demonstrate that both the proposed procedures properly control the false positive rate at the nominal level.

中文翻译:

使用广义线性混合效应模型对 RNA-Seq 差异表达研究进行功效分析。

背景技术功率分析成为当前生物医学研究实验设计的必然步骤。允许不同相关结构的复杂设计通常用于 RNA-Seq 实验。然而,该领域目前缺乏统计方法来计算样本量和估计使用此类设计的 RNA-Seq 差异表达研究的功效。为了填补这一空白,基于模拟的方法通过提供数值解决方案具有很大的优势,因为测试统计的理论分布通常不适用于此类设计。结果 在本文中,我们提出了一种新的基于模拟的程序,用于对相关表达数据使用广义线性混合效应模型来估计差异表达的功率。我们还提出了一种使用配对设计的双变量负二项分布来估计差异表达的功效的新程序。我们将在各种模拟场景下的似然比检验和 Wald 检验的性能与所提出的程序进行了比较。模拟分布用于估计测试统计的零分布,以实现所需的假阳性控制,并与渐近卡方分布进行比较。此外,我们将配对设计程序应用于 TCGA 乳腺癌数据集。结论 总之,我们提供了一个框架,用于在复杂的实验设计下对 RNA-Seq 差异表达进行功率估计。
更新日期:2020-05-19
down
wechat
bug