当前位置: X-MOL 学术Genome Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A comparison framework and guideline of clustering methods for mass cytometry data
Genome Biology ( IF 10.1 ) Pub Date : 2019-12-01 , DOI: 10.1186/s13059-019-1917-7
Xiao Liu 1 , Weichen Song 2 , Brandon Y Wong 1, 3 , Ting Zhang 1 , Shunying Yu 2 , Guan Ning Lin 1, 2 , Xianting Ding 1
Affiliation  

BackgroundWith the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations.ResultTo address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases.ConclusionAll the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools.

中文翻译:

大规模流式细胞术数据聚类方法的比较框架和指南

背景随着质谱流式细胞术在医学研究中的应用不断扩大,已经开发了多种半监督和无监督的聚类方法用于数据分析。选择最优的聚类方法可以加速识别有意义的细胞群。 结果为了解决这个问题,我们比较了三类性能指标,“精度”作为外部评价,“一致性”作为内部评价,以及基于六种方法的稳定性。独立的基准数据集。七种无监督方法(Accense、Xshift、PhenoGraph、FlowSOM、flowMeans、DEPECHE 和 kmeans)和两种半监督方法(自动细胞类型发现和分类以及线性判别分析 (LDA))在六个质谱流式细胞术数据集上进行了测试。我们针对随机子采样、不同的样本大小和每种方法的集群数量计算和比较所有定义的性能度量。LDA 最精确地复制了手动标签,但在内部评估中并没有名列前茅。PhenoGraph 和 FlowSOM 在精度、连贯性和稳定性方面的性能优于其他无监督工具。PhenoGraph 和 Xshift 在检测精细的子集群时更加稳健,而 DEPECHE 和 FlowSOM 倾向于将相似的集群分组为元集群。PhenoGraph、Xshift 和 flowMeans 的性能受样本量增加的影响,但 FlowSOM 随着样本量的增加相对稳定。结论所有评估包括精度、一致性、稳定性、在选择合适的细胞计数数据分析工具时,应综合考虑聚类分辨率和聚类分辨率。因此,我们根据这些特征提供决策指南,让一般读者更容易选择最合适的聚类工具。
更新日期:2019-12-01
down
wechat
bug