当前位置: X-MOL 学术BMC Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
I-Impute: a self-consistent method to impute single cell RNA sequencing data
BMC Genomics ( IF 4.4 ) Pub Date : 2020-11-18 , DOI: 10.1186/s12864-020-07007-w
Xikang Feng , Lingxi Chen , Zishuai Wang , Shuai Cheng Li

Single-cell RNA-sequencing (scRNA-seq) is becoming indispensable in the study of cell-specific transcriptomes. However, in scRNA-seq techniques, only a small fraction of the genes are captured due to “dropout” events. These dropout events require intensive treatment when analyzing scRNA-seq data. For example, imputation tools have been proposed to estimate dropout events and de-noise data. The performance of these imputation tools are often evaluated, or fine-tuned, using various clustering criteria based on ground-truth cell subgroup labels. This limits their effectiveness in the cases where we lack cell subgroup knowledge. We consider an alternative strategy which requires the imputation to follow a “self-consistency” principle; that is, the imputation process is to refine its results until there is no internal inconsistency or dropouts from the data. We propose the use of “self-consistency” as a main criteria in performing imputation. To demonstrate this principle we devised I-Impute, a “self-consistent” method, to impute scRNA-seq data. I-Impute optimizes continuous similarities and dropout probabilities, in iterative refinements until a self-consistent imputation is reached. On the in silico data sets, I-Impute exhibited the highest Pearson correlations for different dropout rates consistently compared with the state-of-art methods SAVER and scImpute. Furthermore, we collected three wetlab datasets, mouse bladder cells dataset, embryonic stem cells dataset, and aortic leukocyte cells dataset, to evaluate the tools. I-Impute exhibited feasible cell subpopulation discovery efficacy on all the three datasets. It achieves the highest clustering accuracy compared with SAVER and scImpute. A strategy based on “self-consistency”, captured through our method, I-Impute, gave imputation results better than the state-of-the-art tools. Source code of I-Impute can be accessed at https://github.com/xikanfeng2/I-Impute .

中文翻译:

I-Impute:导入单细胞RNA测序数据的自洽方法

单细胞RNA测序(scRNA-seq)在细胞特异性转录组的研究中变得不可缺少。但是,在scRNA-seq技术中,由于“缺失”事件,只有一小部分基因被捕获。在分析scRNA-seq数据时,这些脱落事件需要进行深入处理。例如,已经提出了插补工具来估计漏失事件和降噪数据。这些插补工具的性能通常使用基于真实单元格子组标签的各种聚类标准进行评估或微调。在我们缺乏细胞亚群知识的情况下,这限制了它们的有效性。我们考虑了一种替代策略,该策略要求归类遵循“自洽”原则。那是,估算过程将完善其结果,直到没有内部不一致或数据丢失。我们建议使用“自洽”作为执行归因的主要标准。为了证明这一原理,我们设计了I-Impute(一种“自洽”方法)来估算scRNA-seq数据。I-Impute通过迭代优化来优化连续相似性和辍学概率,直到达到自洽归因为止。在计算机数据集上,与最新方法SAVER和scImpute相比,I-Impute在不同的辍学率方面表现出最高的Pearson相关性。此外,我们收集了三个wetlab数据集,小鼠膀胱细胞数据集,胚胎干细胞数据集和主动脉白细胞数据集,以评估工具。I-Impute在所有三个数据集上均显示出可行的细胞亚群发现功效。与SAVER和scImpute相比,它具有最高的聚类精度。通过我们的方法I-Impute捕获的基于“自我一致性”的策略所提供的插补结果要优于最新工具。可以在https://github.com/xikanfeng2/I-Impute上访问I-Impute的源代码。
更新日期:2020-11-19
down
wechat
bug