当前位置: X-MOL 学术Proc. Natl. Acad. Sci. U.S.A. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integration and transfer learning of single-cell transcriptomes via cFIT [Statistics]
Proceedings of the National Academy of Sciences of the United States of America ( IF 9.4 ) Pub Date : 2021-03-09 , DOI: 10.1073/pnas.2024383118
Minshi Peng 1 , Yue Li 1 , Brie Wamsley 2 , Yuting Wei 1 , Kathryn Roeder 3, 4
Affiliation  

Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development.



中文翻译:


通过 cFIT 进行单细胞转录组的整合和​​迁移学习 [统计]



已经生成了大量全面的单细胞 RNA 测序 (scRNA-seq) 数据集,可以对各种生物和临床条件下的细胞类型进行完整的转录表征。随着测量不同细胞模式的新方法的出现,一个关键的分析挑战是整合这些数据集或将知识从一个数据集转移到另一个数据集,以更好地了解细胞身份和功能。在这里,我们提出了一种简单但令人惊讶的有效方法,称为共因子整合和迁移学习(cFIT),用于捕获跨实验、技术、受试者甚至物种的各种批次效应。所提出的方法通过公共因子空间对不同数据集之间的共享信息进行建模,同时允许每批中基因表达的独特扭曲和变化。模型参数在迭代非负矩阵分解 (NMF) 框架下学习,然后用于跨域分析的同步集成。此外,该模型还可以通过低秩矩阵从信息量更大的数据中进行传输,以便在较低质量的数据中进行精确识别。与现有方法相比,我们的方法对每个单独数据集的细胞组成施加了较弱的假设;然而,它被证明在保存生物变异方面更可靠。我们将 cFIT 应用于人类和小鼠大脑发育的多个 scRNA-seq 数据集,这些数据集因技术和发育阶段而异。成功的整合和转移揭示了跨系统的转录相似性。该研究有助于建立脑细胞类型多样性的全面图景,并提供对大脑发育的见解。

更新日期:2021-03-04
down
wechat
bug