当前位置: X-MOL 学术Bioinformatics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic identification of relevant genes from low-dimensional embeddings of single cell RNAseq data
Bioinformatics ( IF 5.8 ) Pub Date : 2020-03-24 , DOI: 10.1093/bioinformatics/btaa198
Philipp Angerer 1, 2 , David S Fischer 1, 2 , Fabian J Theis 1 , Antonio Scialdone 1, 3, 4 , Carsten Marr 1
Affiliation  

Dimensionality reduction is a key step in the analysis of single-cell RNA sequencing data. It produces a low-dimensional embedding for visualization and as a calculation base for downstream analysis. Nonlinear techniques are most suitable to handle the intrinsic complexity of large, heterogeneous single cell data. However, with no linear relation between gene and embedding coordinate, there is no way to extract the identity of genes driving any cell’s position in the low-dimensional embedding, making it more difficult to characterize the underlying biological processes.In this paper, we introduce the concepts of local and global gene relevance to compute an equivalent of principal component analysis loadings for non-linear low-dimensional embeddings. Global gene relevance identifies drivers of the overall embedding, while local gene relevance identifies those of a defined subregion. We apply our method to single-cell RNAseq datasets from different experimental protocols and to different low dimensional embedding techniques. This shows our method’s versatility to identify key genes for a variety of biological processes.To ensure reproducibility and ease of use, our method is released as part of destiny 3.0, a popular R package for building diffusion maps from single-cell transcriptomic data. It is readily available through Bioconductor.

中文翻译:

从单细胞 RNAseq 数据的低维嵌入中自动识别相关基因

降维是单细胞RNA测序数据分析的关键步骤。它产生用于可视化的低维嵌入并作为下游分析的计算基础。非线性技术最适合处理大型、异构单细胞数据的内在复杂性。然而,由于基因和嵌入坐标之间没有线性关系,因此无法提取驱动低维嵌入中任何细胞位置的基因的身份,使得表征潜在的生物过程变得更加困难。在本文中,我们介绍了局部和全局基因相关性的概念,用于计算非线性低维嵌入的主成分分析载荷的等效值。全局基因相关性确定了整体嵌入的驱动因素,而局部基因相关性则确定了定义的次区域的驱动因素。我们将我们的方法应用于来自不同实验方案的单细胞 RNAseq 数据集和不同的低维嵌入技术。这显示了我们的方法在识别各种生物过程的关键基因方面的多功能性。为了确保重现性和易用性,我们的方法作为命运 3.0 的一部分发布,命运 3.0 是一个流行的 R 包,用于从单细胞转录组数据构建扩散图。它可以通过 Bioconductor 轻松获得。
更新日期:2020-03-24
down
wechat
bug