当前位置: X-MOL 学术bioRxiv. Genom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dimension reduction and denoising of single-cell RNA sequencing data in the presence of observed confounding variables
bioRxiv - Genomics Pub Date : 2020-08-04 , DOI: 10.1101/2020.08.03.234765
Mo Huang , Zhaojun Zhang , Nancy R. Zhang

Confounding variation, such as batch effects, are a pervasive issue in single-cell RNA sequencing experiments. While methods exist for aligning cells across batches, it is yet unclear how to correct for other types of confounding variation which may be observed at the subject level, such as age and sex, and at the cell level, such as library size and other measures of cell quality. On the specific problem of batch alignment, many questions still persist despite recent advances: Existing methods can effectively align batches in low-dimensional representations of cells, yet their effectiveness in aligning the original gene expression matrices is unclear. Nor is it clear how batch correction can be performed alongside data denoising, the former treating technical biases due to experimental stratification while the latter treating technical variation due inherently to the random sampling that occurs during library construction and sequencing. Here, we propose SAVERCAT, a method for dimension reduction and denoising of single-cell gene expression data that can flexibly adjust for arbitrary observed covariates. We benchmark SAVERCAT against existing single-cell batch correction methods and show that while it matches the best of the field in low-dimensional cell alignment, it significantly improves upon existing methods on the task of batch correction in the high-dimensional expression matrix. We also demonstrate the ability of SAVERCAT to effectively integrate batch correction and denoising through a data down-sampling experiment. Finally, we apply SAVERCAT to a single cell study of Alzheimer's disease where batch is confounded with the contrast of interest, and demonstrate how adjusting for covariates other than batch allows for more interpretable analysis.

中文翻译:

在存在观察到的混杂变量的情况下,单细胞RNA测序数据的降维和去噪

在单细胞RNA测序实验中,诸如批次效应之类的混杂变异是一个普遍存在的问题。尽管存在用于跨批次对齐细胞的方法,但尚不清楚如何校正可能在受试者水平(例如年龄和性别)以及细胞水平(例如文库大小和其他度量)下观察到的其他类型的混杂变异。细胞质量。关于批次对齐的特定问题,尽管有最新进展,但仍然存在许多问题:现有方法可以在细胞的低维表示中有效对齐批次,但是它们在对齐原始基因表达矩阵方面的有效性尚不清楚。也不清楚如何在数据去噪的同时进行批量校正,前者处理由于实验分层而引起的技术偏差,而后者则处理由于库构建和测序过程中发生的随机抽样而固有的技术差异。在这里,我们提出SAVERCAT,一种用于单细胞基因表达数据的降维和去噪的方法,可以灵活地调整任意观察到的协变量。我们以现有的单细胞批量校正方法为基准对SAVERCAT进行了基准测试,结果表明,虽然它与低维单元对齐领域中的最佳结果相匹配,但在高维表达矩阵中的批量校正任务上,它显着改善了现有方法。我们还展示了SAVERCAT通过数据降采样实验有效集成批次校正和去噪的能力。最后,
更新日期:2020-08-05
down
wechat
bug