当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
Genome Biology ( IF 12.3 ) Pub Date : 2019-12-01 , DOI: 10.1186/s13059-019-1874-1
Christoph Hafemeister 1 , Rahul Satija 1, 2
Affiliation  

Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from “regularized negative binomial regression,” where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform, with a direct interface to our single-cell toolkit Seurat.

中文翻译:

使用正则化负二项式回归对单细胞 RNA-seq 数据进行标准化和方差稳定

由于技术因素(包括每个细胞中检测到的分子数量),单细胞 RNA-seq (scRNA-seq) 数据表现出显着的细胞间差异,这可能会混淆生物异质性与技术效果。为了解决这个问题,我们提出了一个建模框架,用于 scRNA-seq 实验中分子计数数据的标准化和方差稳定。我们提出,“正则化负二项式回归”的皮尔逊残差(其中细胞测序深度被用作广义线性模型中的协变量)成功地消除了下游分析中技术特征的影响,同时保留了生物异质性。重要的是,我们表明,无约束的负二项式模型可能会过度拟合 scRNA-seq 数据,并通过汇集具有相似丰度的基因的信息来克服这一问题,以获得稳定的参数估计。我们的程序省略了对启发式步骤的需要,包括伪计数添加或对数转换,并改进了常见的下游分析任务,例如变量基因选择、降维和差异表达。我们的方法可以应用于任何基于 UMI 的 scRNA-seq 数据集,并且作为 R 包 sctransform 的一部分免费提供,并可直接连接到我们的单细胞工具包 Seurat。
更新日期:2019-12-01
down
wechat
bug