SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data.,Genomics, Proteomics & Bioinformatics

当前位置： X-MOL 学术 › Genom. Proteom. Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data.
Genomics, Proteomics & Bioinformatics ( IF 11.5 ) Pub Date : 2019-06-16 , DOI: 10.1016/j.gpb.2018.10.003
Xianwen Ren ₁ , Liangtao Zheng ₁ , Zemin Zhang ₁

Affiliation

Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC), a new clustering framework based on random projection and feature construction, for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy, robustness, and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, SSCC achieved 20% improvement for clustering accuracy and 50-fold acceleration, but only consumed 66% memory usage, compared to the widelyused software package SC3. Compared to k-means, the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust.

中文翻译：

SSCC：一种用于快速准确地聚类大规模单细胞RNA序列数据的新颖计算框架。

聚类是分析单细胞RNA测序（scRNA-seq）数据的普遍分析方法，但是快速扩展的数据量可能会使此过程在计算上具有挑战性。迫切需要用于精确和有效聚类的新方法。在这里，我们针对大型scRNA-seq数据提出了Spearman子采样聚类分类（SSCC），一种基于随机投影和特征构建的新聚类框架。对于在多个真实数据集上进行基准测试的各种最新算法，SSCC极大地提高了聚类准确性，鲁棒性和计算效率。在拥有68,578个人类血细胞的数据集上，与广泛使用的软件包SC3相比，SSCC的聚类准确性和50倍加速提高了20％，但仅消耗了66％的内存使用量。与k均值相比，SSCC的精度提高可以达到三倍。https://github.com/Japrin/sscClust提供了SSCC的R实现。

更新日期：2019-11-01

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文