当前位置: X-MOL 学术Genom. Proteom. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluation of Cell Type Annotation R Packages on Single-cell RNA-seq Data
Genomics, Proteomics & Bioinformatics ( IF 11.5 ) Pub Date : 2020-12-24 , DOI: 10.1016/j.gpb.2020.07.004
Qianhui Huang 1 , Yu Liu 2 , Yuheng Du 1 , Lana X Garmire 2
Affiliation  

Annotating cell types is a critical step in single-cell RNA sequencing (scRNA-seq) data analysis. Some supervised or semi-supervised classification methods have recently emerged to enable automated cell type identification. However, comprehensive evaluations of these methods are lacking. Moreover, it is not clear whether some classification methods originally designed for analyzing other bulk omics data are adaptable to scRNA-seq analysis. In this study, we evaluated ten cell type annotation methods publicly available as R packages. Eight of them are popular methods developed specifically for single-cell research, including Seurat, scmap, SingleR, CHETAH, SingleCellNet, scID, Garnett, and SCINA. The other two methods were repurposed from deconvoluting DNA methylation data, i.e., linear constrained projection (CP) and robust partial correlations (RPC). We conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as simulation data. We assessed the accuracy through intra-dataset and inter-dataset predictions; the robustness over practical challenges such as gene filtering, high similarity among cell types, and increased cell type classes; as well as the detection of rare and unknown cell types. Overall, methods such as Seurat, SingleR, CP, RPC, and SingleCellNet performed well, with Seurat being the best at annotating major cell types. Additionally, Seurat, SingleR, CP, and RPC were more robust against downsampling. However, Seurat did have a major drawback at predicting rare cell populations, and it was suboptimal at differentiating cell types highly similar to each other, compared to SingleR and RPC. All the code and data are available from https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark.



中文翻译:


单细胞 RNA-seq 数据上的细胞类型注释 R 包的评估



注释细胞类型是单细胞 RNA 测序 ( scRNA-seq ) 数据分析的关键步骤。最近出现了一些监督或半监督分类方法来实现自动细胞类型识别。然而,缺乏对这些方法的综合评价。此外,尚不清楚一些最初设计用于分析其他批量组学数据的分类方法是否适用于 scRNA-seq 分析。在这项研究中,我们评估了十种以 R 包形式公开提供的细胞类型注释方法。其中八种是专门为单细胞研究开发的流行方法,包括 Seurat、scmap、SingleR、CHETAH、SingleCellNet、scID、Garnett 和 SCINA。其他两种方法是从DNA甲基化数据去卷积中重新调整的,线性约束投影(CP)和稳健偏相关(RPC)。我们对各种公共 scRNA-seq 数据集以及模拟数据进行了系统比较。我们通过数据集内和数据集间预测评估准确性;对基因过滤、细胞类型之间的高度相似性以及增加的细胞类型类别等实际挑战的鲁棒性;以及检测稀有和未知的细胞类型。总体而言,Seurat、SingleR、CP、RPC 和 SingleCellNet 等方法表现良好,其中 Seurat 最擅长注释主要细胞类型。此外,Seurat、SingleR、CP 和 RPC 对于下采样更加稳健。然而,Seurat 在预测稀有细胞群方面确实存在一个主要缺点,与 SingleR 和 RPC 相比,它在区分彼此高度相似的细胞类型方面表现不佳。 所有代码和数据均可从 https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark 获取。

更新日期:2020-12-24
down
wechat
bug