A benchmark of batch-effect correction methods for single-cell RNA sequencing data,Genome Biology

当前位置： X-MOL 学术 › Genome Biol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A benchmark of batch-effect correction methods for single-cell RNA sequencing data
Genome Biology ( IF 10.1 ) Pub Date : 2020-01-16 , DOI: 10.1186/s13059-019-1850-9
Hoa Thi Nhu Tran ₁ , Kok Siong Ang ₁ , Marion Chevrier ₁ , Xiaomeng Zhang ₁ , Nicole Yee Shin Lee ₁ , Michelle Goh ₁ , Jinmiao Chen ₁

Affiliation

Background Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal. Results We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression. Conclusion Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.

中文翻译：

单细胞RNA测序数据批量效应校正方法的基准

背景使用不同技术生成的大规模单细胞转录组数据集包含批次特异性的系统变异，这对批次效应去除和数据集成提出了挑战。随着 scRNA-seq 数据的持续增长，利用可用计算资源实现有效的批量集成至关重要。在这里，我们对可用的批量校正方法进行了深入的基准研究，以确定最合适的批量效应消除方法。结果我们在计算运行时间、处理大型数据集的能力以及在保持细胞类型纯度的同时进行批量效应校正功效方面比较了 14 种方法。该研究设计了五种场景：相同的细胞类型采用不同的技术、不同的细胞类型、多批次、大数据和模拟数据。使用 kBET、LISI、ASW 和 ARI 等四个基准指标来评估性能。我们还研究了使用批量校正数据来研究差异基因表达。结论根据我们的结果，Harmony、LIGER 和 Seurat 3 是推荐的批量集成方法。由于其运行时间显着缩短，建议将 Harmony 作为第一个尝试方法，其他方法作为可行的替代方法。

更新日期：2020-01-16

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11