当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ScCAEs: deep clustering of single-cell RNA-seq via convolutional autoencoder embedding and soft K-means
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2021-08-15 , DOI: 10.1093/bib/bbab321
Hang Hu 1 , Zhong Li 2 , Xiangjie Li 3 , Minzhe Yu 1 , Xiutao Pan 1
Affiliation  

Clustering and cell type classification are a vital step of analyzing scRNA-seq data to reveal the complexity of the tissue (e.g. the number of cell types and the transcription characteristics of the respective cell type). Recently, deep learning-based single-cell clustering algorithms become popular since they integrate the dimensionality reduction with clustering. But these methods still have unstable clustering effects for the scRNA-seq datasets with high dropouts or noise. In this study, a novel single-cell RNA-seq deep embedding clustering via convolutional autoencoder embedding and soft K-means (scCAEs) is proposed by simultaneously learning the feature representation and clustering. It integrates the deep learning with convolutional autoencoder to characterize scRNA-seq data and proposes a regularized soft K-means algorithm to cluster cell populations in a learned latent space. Next, a novel constraint is introduced to the clustering objective function to iteratively optimize the clustering results, and more importantly, it is theoretically proved that this objective function optimization ensures the convergence. Moreover, it adds the reconstruction loss to the objective function combining the dimensionality reduction with clustering to find a more suitable embedding space for clustering. The proposed method is validated on a variety of datasets, in which the number of clusters in the mentioned datasets ranges from 4 to 46, and the number of cells ranges from 90 to 30 302. The experimental results show that scCAEs is superior to other state-of-the-art methods on the mentioned datasets, and it also keeps the satisfying compatibility and robustness. In addition, for single-cell datasets with the batch effects, scCAEs can ensure the cell separation while removing batch effects.

中文翻译:

ScCAEs:通过卷积自动编码器嵌入和软 K-means 对单细胞 RNA-seq 进行深度聚类

聚类和细胞类型分类是分析 scRNA-seq 数据以揭示组织复杂性(例如细胞类型的数量和相应细胞类型的转录特征)的重要步骤。最近,基于深度学习的单细胞聚类算法变得流行,因为它们将降维与聚类相结合。但是这些方法对于具有高 dropout 或噪声的 scRNA-seq 数据集仍然具有不稳定的聚类效果。在这项研究中,通过同时学习特征表示和聚类,提出了一种通过卷积自动编码器嵌入和软 K 均值 (scCAEs) 进行的新型单细胞 RNA-seq 深度嵌入聚类。它将深度学习与卷积自动编码器相结合来表征 scRNA-seq 数据,并提出了一种正则化的软 K-means 算法来对学习的潜在空间中的细胞群进行聚类。其次,在聚类目标函数中引入了一种新的约束来迭代优化聚类结果,更重要的是,理论上证明了这种目标函数优化保证了收敛性。此外,它将重构损失添加到目标函数中,将降维与聚类相结合,以找到更适合聚类的嵌入空间。所提出的方法在各种数据集上进行了验证,其中提到的数据集中的聚类数量范围为 4 到 46,细胞数量范围为 90 到 30 302。实验结果表明,scCAEs 在上述数据集上优于其他最先进的方法,并且还保持了令人满意的兼容性和鲁棒性。此外,对于具有批次效应的单细胞数据集,scCAEs 可以在消除批次效应的同时保证细胞分离。
更新日期:2021-08-15
down
wechat
bug