当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
scHiCStackL: a stacking ensemble learning-based method for single-cell Hi-C classification using cell embedding
Briefings in Bioinformatics ( IF 9.5 ) Pub Date : 2021-09-08 , DOI: 10.1093/bib/bbab396
Hao Wu 1, 2 , Yingfu Wu 1 , Yuhong Jiang 1 , Bing Zhou 1 , Haoru Zhou 1 , Zhongli Chen 1 , Yi Xiong 3 , Quanzhong Liu 1 , Hongming Zhang 1
Affiliation  

Single-cell Hi-C data are a common data source for studying the differences in the three-dimensional structure of cell chromosomes. The development of single-cell Hi-C technology makes it possible to obtain batches of single-cell Hi-C data. How to quickly and effectively discriminate cell types has become one hot research field. However, the existing computational methods to predict cell types based on Hi-C data are found to be low in accuracy. Therefore, we propose a high accuracy cell classification algorithm, called scHiCStackL, based on single-cell Hi-C data. In our work, we first improve the existing data preprocessing method for single-cell Hi-C data, which allows the generated cell embedding better to represent cells. Then, we construct a two-layer stacking ensemble model for classifying cells. Experimental results show that the cell embedding generated by our data preprocessing method increases by 0.23, 1.22, 1.46 and 1.61$\%$ comparing with the cell embedding generated by the previously published method scHiCluster, in terms of the Acc, MCC, F1 and Precision confidence intervals, respectively, on the task of classifying human cells in the ML1 and ML3 datasets. When using the two-layer stacking ensemble framework with the cell embedding, scHiCStackL improves by 13.33, 19, 19.27 and 14.5 over the scHiCluster, in terms of the Acc, ARI, NMI and F1 confidence intervals, respectively. In summary, scHiCStackL achieves superior performance in predicting cell types using the single-cell Hi-C data. The webserver and source code of scHiCStackL are freely available at http://hww.sdu.edu.cn:8002/scHiCStackL/ and https://github.com/HaoWuLab-Bioinformatics/scHiCStackL, respectively.

中文翻译:

scHiCStackL:一种基于堆叠集成学习的单细胞 Hi-C 分类方法,使用细胞嵌入

单细胞Hi-C数据是研究细胞染色体三维结构差异的常用数据源。单细胞Hi-C技术的发展,使得批量获取单细胞Hi-C数据成为可能。如何快速有效地区分细胞类型已成为研究热点之一。然而,现有的基于 Hi-C 数据预测细胞类型的计算方法被发现准确性较低。因此,我们提出了一种基于单细胞 Hi-C 数据的高精度细胞分类算法,称为 scHiCStackL。在我们的工作中,我们首先改进了现有的单细胞 Hi-C 数据的数据预处理方法,使生成的细胞嵌入更好地表示细胞。然后,我们构建了一个用于对细胞进行分类的两层堆叠集成模型。实验结果表明,与之前发表的方法 scHiCluster 生成的 cell embedding 相比,我们的数据预处理方法生成的 cell embedding 在 Acc、MCC、F1 和 Precision 方面增加了 0.23、1.22、1.46 和 1.61$\%$置信区间,分别关于在 ML1 和 ML3 数据集中对人体细胞进行分类的任务。当使用带有单元嵌入的两层堆叠集成框架时,scHiCStackL 在 Acc、ARI、NMI 和 F1 置信区间方面分别比 scHiCluster 提高了 13.33、19、19.27 和 14.5。总之,scHiCStackL 在使用单细胞 Hi-C 数据预测细胞类型方面取得了卓越的性能。scHiCStackL 的网络服务器和源代码在 http://hww.sdu.edu.cn:8002/scHiCStackL/ 和 https://github 上免费提供。
更新日期:2021-09-08
down
wechat
bug