当前位置: X-MOL 学术Genom. Proteom. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets
Genomics, Proteomics & Bioinformatics ( IF 9.5 ) Pub Date : 2020-12-24 , DOI: 10.1016/j.gpb.2020.09.002
Qianqian Song 1 , Jing Su 2 , Lance D Miller 1 , Wei Zhang 1
Affiliation  

In gene expression profiling studies, including single-cell RNA sequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model (scLM), a gene co-clustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, scLM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of scLM, we apply it to our in-house and public experimental scRNA-seq datasets. scLM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM.



中文翻译:

scLM:跨多个单细胞数据集的共识基因簇的自动检测

在基因表达谱研究中,包括单细胞 RNA测序 ( scRNA - seq ) 分析,共表达基因的鉴定和表征提供了有关细胞身份和功能的关键信息。scRNA-seq 数据中的基因共表达聚类存在一定的挑战。我们表明,单细胞数据的常用方法无法准确识别共表达基因,并且产生的结果大大限制了共表达基因的生物学预期。在这里,我们提出了单细胞潜变量模型 (scLM),这是一种针对单细胞数据量身定制的基因共聚类算法,在检测具有重要生物学背景的基因簇方面表现良好。重要的是,scLM 可以同时聚类多个单细胞数据集,共识聚类,使用户能够利用来自多个来源的单细胞数据进行新颖的比较分析。scLM 将原始计数数据作为输入并保留生物变异,而不受来自多个数据集的批次效应的影响。模拟数据和实验数据的结果表明,scLM 的性能优于现有方法,并且准确性大大提高。为了说明 scLM 的生物学见解,我们将其应用于我们的内部和公共实验性 scRNA-seq 数据集。scLM 可识别新的功能基因模块并改进细胞状态,这有助于机制发现和对癌症等复杂生物系统的理解。https://github.com/QSong-github/scLM 提供了一个用户友好的 R 包,其中包含 scLM 方法的所有关键特性。

更新日期:2020-12-24
down
wechat
bug