当前位置: X-MOL 学术Hum. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Size matters: how sample size affects the reproducibility and specificity of gene set analysis.
Human Genomics ( IF 3.8 ) Pub Date : 2019-10-22 , DOI: 10.1186/s40246-019-0226-2
Farhad Maleki 1 , Katie Ovens 1 , Ian McQuillan 1 , Anthony J Kusalik 1
Affiliation  

BACKGROUND Gene set analysis is a well-established approach for interpretation of data from high-throughput gene expression studies. Achieving reproducible results is an essential requirement in such studies. One factor of a gene expression experiment that can affect reproducibility is the choice of sample size. However, choosing an appropriate sample size can be difficult, especially because the choice may be method-dependent. Further, sample size choice can have unexpected effects on specificity. RESULTS In this paper, we report on a systematic, quantitative approach to study the effect of sample size on the reproducibility of the results from 13 gene set analysis methods. We also investigate the impact of sample size on the specificity of these methods. Rather than relying on synthetic data, the proposed approach uses real expression datasets to offer an accurate and reliable evaluation. CONCLUSION Our findings show that, as a general pattern, the results of gene set analysis become more reproducible as sample size increases. However, the extent of reproducibility and the rate at which it increases vary from method to method. In addition, even in the absence of differential expression, some gene set analysis methods report a large number of false positives, and increasing sample size does not lead to reducing these false positives. The results of this research can be used when selecting a gene set analysis method from those available.

中文翻译:

大小很重要:样本大小如何影响基因组分析的可重复性和特异性。

背景技术基因组分析是用于解释来自高通量基因表达研究的数据的公认方法。获得可再现的结果是此类研究的基本要求。影响表达能力的基因表达实验的一个因素是样本量的选择。但是,选择合适的样本量可能很困难,尤其是因为选择可能取决于方法。此外,样本量的选择可能会对特异性产生意想不到的影响。结果在本文中,我们报告了一种系统的,定量的方法,以研究样本量对13种基因组分析方法的结果可重复性的影响。我们还研究了样本量对这些方法特异性的影响。而不是依靠综合数据,所提出的方法使用实数表达数据集来提供准确和可靠的评估。结论我们的发现表明,作为一般模式,随着样本量的增加,基因组分析的结果具有更高的重现性。但是,可重复性的程度和增加的速率因方法而异。另外,即使在没有差异表达的情况下,一些基因组分析方法也会报告大量假阳性,并且增加样本量也不会导致减少这些假阳性。从现有方法中选择基因组分析方法时,可以使用本研究的结果。重复性的程度和增加的速率因方法而异。另外,即使在没有差异表达的情况下,一些基因组分析方法也会报告大量假阳性,并且增加样本量也不会导致减少这些假阳性。从现有方法中选择基因组分析方法时,可以使用本研究的结果。重复性的程度和增加的速率因方法而异。另外,即使在没有差异表达的情况下,一些基因组分析方法也会报告大量假阳性,并且增加样本量也不会导致减少这些假阳性。从现有方法中选择基因组分析方法时,可以使用本研究的结果。
更新日期:2020-04-22
down
wechat
bug