当前位置: X-MOL 学术Nucleic Acids Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RUV-III-NB: normalization of single cell RNA-seq data
Nucleic Acids Research ( IF 14.9 ) Pub Date : 2022-06-27 , DOI: 10.1093/nar/gkac486
Agus Salim 1, 2, 3, 4, 5 , Ramyar Molania 2 , Jianan Wang 2, 6 , Alysha De Livera 1, 2, 4, 5, 7 , Rachel Thijssen 8 , Terence P Speed 2, 3
Affiliation  

Normalization of single cell RNA-seq data remains a challenging task. The performance of different methods can vary greatly between datasets when unwanted factors and biology are associated. Most normalization methods also only remove the effects of unwanted variation for the cell embedding but not from gene-level data typically used for differential expression (DE) analysis to identify marker genes. We propose RUV-III-NB, a method that can be used to remove unwanted variation from both the cell embedding and gene-level counts. Using pseudo-replicates, RUV-III-NB explicitly takes into account potential association with biology when removing unwanted variation. The method can be used for both UMI or read counts and returns adjusted counts that can be used for downstream analyses such as clustering, DE and pseudotime analyses. Using published datasets with different technological platforms, kinds of biology and levels of association between biology and unwanted variation, we show that RUV-III-NB manages to remove library size and batch effects, strengthen biological signals, improve DE analyses, and lead to results exhibiting greater concordance with independent datasets of the same kind. The performance of RUV-III-NB is consistent and is not sensitive to the number of factors assumed to contribute to the unwanted variation.

中文翻译:

RUV-III-NB:单细胞 RNA-seq 数据的标准化

单细胞 RNA-seq 数据的标准化仍然是一项具有挑战性的任务。当不需要的因素和生物学相关联时,不同方法的性能在数据集之间可能会有很大差异。大多数标准化方法也只消除了细胞嵌入的不需要的变异的影响,而不是通常用于差异表达 (DE) 分析以识别标记基因的基因水平数据。我们提出了 RUV-III-NB,一种可用于从细胞嵌入和基因水平计数中去除不需要的变异的方法。使用伪复制,RUV-III-NB 在去除不需要的变异时明确考虑了与生物学的潜在关联。该方法可用于 UMI 或读取计数,并返回可用于下游分析(如聚类、DE 和伪时间分析)的调整后计数。使用具有不同技术平台、生物学种类和生物学与不需要的变异之间关联水平的已发布数据集,我们表明 RUV-III-NB 设法消除库大小和批次效应,增强生物学信号,改进 DE 分析,并产生结果与同类独立数据集表现出更大的一致性。RUV-III-NB 的性能是一致的,并且对假定会导致不需要的变化的因素的数量不敏感。
更新日期:2022-06-27
down
wechat
bug