Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms,Bioinformatics

当前位置： X-MOL 学术 › Bioinformatics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms
Bioinformatics ( IF 4.4 ) Pub Date : 2019-06-04 , DOI: 10.1093/bioinformatics/btz447
Joanna Zyla _{1,

2} , Michal Marczyk _{1,

3} , Teresa Domaszewska ₂ , Stefan H E Kaufmann ₂ , Joanna Polanska ₁ , January Weiner ₂

Affiliation

Motivation

Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies.

Results

We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility.

Availability and implementation

tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository.

Supplementary information

Supplementary dataSupplementary data are available at Bioinformatics online.

中文翻译：

可重复科学的基因集富集：CERNO与其他八种算法的比较

动机

基因组（GS）富集分析是功能组学研究的重要组成部分。在这里，我们用一种新颖的方法来补充已建立的GS富集算法的评估指标，以评估从GS富集测试获得的科学结果应用于不同研究的相关数据时的实际可重复性。

结果

我们针对可重复性，灵敏度，优先级，误报率和计算时间评估了八种既定算法和一种新颖算法。除了建立的八种算法外，我们还包括数值观测中的重合极端等级（CERNO），这是一种基于改进的Fisher P值积分的灵活而快速的算法。使用现实世界的数据集，我们证明了CERNO在对指标以及样本和GS大小进行排名方面具有鲁棒性。CERNO具有最高的可重复性，同时保持灵敏，特异和快速。在具有重叠基因权重降低的Pathway Analysis总体排名中，CERNO和过度表达分析表现最佳，而CERNO和GeneSetTest在可重复性方面得分很高。

可用性和实施

可从CRAN（cran.r-project.org/web/packages/tmod/index.html）获得实现CERNO算法的tmod软件包，并可在http://tmod.online/找到在线实现。这项研究中分析的数据集可在KEGGdzPathwaysGEO，KEGGandMetacoreDzPathwaysGEO R包和GEO存储库中找到。

补充资料

补充数据补充数据可从Bioinformatics在线获得。

更新日期：2020-01-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11