当前位置: X-MOL 学术J. Stat. Distrib. App. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combining assumptions and graphical network into gene expression data analysis
Journal of Statistical Distributions and Applications Pub Date : 2021-07-08 , DOI: 10.1186/s40488-021-00126-z
Demba Fofana 1 , E. O. George 2 , Dale Bowman 2
Affiliation  

Analyzing gene expression data rigorously requires taking assumptions into consideration but also relies on using information about network relations that exist among genes. Combining these different elements cannot only improve statistical power, but also provide a better framework through which gene expression can be properly analyzed. We propose a novel statistical model that combines assumptions and gene network information into the analysis. Assumptions are important since every test statistic is valid only when required assumptions hold. So, we propose hybrid p-values and show that, under the null hypothesis of primary interest, these p-values are uniformly distributed. These proposed hybrid p-values take assumptions into consideration. We incorporate gene network information into the analysis because neighboring genes share biological functions. This correlation factor is taken into account via similar prior probabilities for neighboring genes. With a series of simulations our approach is compared with other approaches. Area Under the ROC Curves (AUCs) are constructed to compare the different methodologies; the AUC based on our methodology is larger than others. For regression analysis, AUC from our proposed method contains AUCs of Spearman test and of Pearson test. In addition, true negative rates (TNRs) also known as specificities are higher with our approach than with the other approaches. For two group comparison analysis, for instance, with a sample size of n=10, specificity corresponding to our proposed methodology is 0.716146 and specificities for t-test and rank sum are 0.689223 and 0.69797, respectively. Our method that combines assumptions and network information into the analysis is shown to be more powerful. These proposed procedures are introduced as a general class of methods that can incorporate procedure-selection, account for multiple-testing, and incorporate graphical network information into the analysis. We obtain very good performance in simulations, and in real data analysis.

中文翻译:

将假设和图形网络结合到基因表达数据分析中

严格分析基因表达数据需要考虑假设,但也依赖于使用有关基因之间存在的网络关系的信息。结合这些不同的元素不仅可以提高统计能力,而且可以提供一个更好的框架,通过它可以正确地分析基因表达。我们提出了一种新的统计模型,将假设和基因网络信息结合到分析中。假设很重要,因为每个测试统计量仅在所需假设成立时才有效。因此,我们提出了混合 p 值并表明,在主要兴趣的零假设下,这些 p 值是均匀分布的。这些提议的混合 p 值考虑了假设。我们将基因网络信息纳入分析,因为相邻基因共享生物功能。通过相邻基因的类似先验概率来考虑该相关因子。通过一系列模拟,我们的方法与其他方法进行了比较。构建 ROC 曲线下面积 (AUC) 以比较不同的方法;基于我们方法的 AUC 比其他方法大。对于回归分析,我们提出的方法的 AUC 包含 Spearman 检验和 Pearson 检验的 AUC。此外,与其他方法相比,我们的方法的真阴性率 (TNR) 也称为特异性更高。例如,对于两组比较分析,样本量为 n=10,对应于我们提出的方法的特异性为 0。716146 和 t 检验和秩和的特异性分别为 0.689223 和 0.69797。我们将假设和网络信息结合到分析中的方法被证明是更强大的。这些提议的程序被引入作为可以合并程序选择、考虑多重测试并将图形网络信息合并到分析中的一般方法类。我们在模拟和实际数据分析中获得了非常好的性能。
更新日期:2021-07-08
down
wechat
bug