当前位置: X-MOL 学术Bioinformatics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF.
Bioinformatics ( IF 5.8 ) Pub Date : 2020-03-24 , DOI: 10.1093/bioinformatics/btaa201
Meenakshi Venkatasubramanian 1, 2 , Kashish Chetal 2 , Daniel J Schnell 2 , Gowtham Atluri 1 , Nathan Salomonis 2, 3
Affiliation  

MOTIVATION The rapid proliferation of single-cell RNA-Sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface. RESULTS We describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse-NMF, cluster "fitness", SVM) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively down-samples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell-types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets. AVAILABILITY AND IMPLEMENTATION ICGS2 is implemented in Python. The source code and documentation are available at: http://altanalyze.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

中文翻译:

通过顺序混合聚类和 NMF 解决数十万个细胞的单细胞异质性。

动机单细胞 RNA 测序 (scRNA-Seq) 技术的快速普及刺激了检测转录相干群体的多种计算方法的发展。虽然检测异质性的算法的复杂性有所增加,但大多数算法都需要大量的用户调整,严重依赖降维技术,并且无法扩展到超大型数据集。我们之前描述了一种多步骤算法,即迭代聚类和引导基因选择(ICGS),该算法应用基因内相关性和混合聚类,从直观的图形用户界面独特地解析新型转录相干细胞群。结果我们描述了 ICGS 的新迭代,当应用于完善的基准时,其性能优于最先进的 scRNA-Seq 检测工作流程。这种方法结合了多种互补的亚型检测方法(HOPACH、稀疏 NMF、聚类“适应度”、SVM)来解决罕见和常见的细胞状态,同时最大限度地减少由于供体或批次效应而导致的差异。使用来自多个细胞图谱的数据,我们表明 PageRank 算法有效地对超大型 scRNA-Seq 数据集进行下采样,而不会丢失极其罕见或转录相似但不同的细胞类型,同时恢复新的转录不同的细胞群。我们相信这种新方法在可重复地解决复杂数据集中隐藏的细胞群方面具有巨大的前景。可用性和实现 ICGS2 用 Python 实现。源代码和文档可从以下网址获取:http://altanalyze.org。
更新日期:2020-03-24
down
wechat
bug