当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Fast tree aggregation for consensus hierarchical clustering.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-03-20 , DOI: 10.1186/s12859-020-3453-6
Audrey Hulot 1, 2, 3 , Julien Chiquet 2 , Florence Jaffrézic 1 , Guillem Rigaill 4, 5, 6
Affiliation  

BACKGROUND In unsupervised learning and clustering, data integration from different sources and types is a difficult question discussed in several research areas. For instance in omics analysis, dozen of clustering methods have been developed in the past decade. When a single source of data is at play, hierarchical clustering (HC) is extremely popular, as a tree structure is highly interpretable and arguably more informative than just a partition of the data. However, applying blindly HC to multiple sources of data raises computational and interpretation issues. RESULTS We propose mergeTrees, a method that aggregates a set of trees with the same leaves to create a consensus tree. In our consensus tree, a cluster at height h contains the individuals that are in the same cluster for all the trees at height h. The method is exact and proven to be [Formula: see text], n being the individuals and q being the number of trees to aggregate. Our implementation is extremely effective on simulations, allowing us to process many large trees at a time. We also rely on mergeTrees to perform the cluster analysis of two real -omics data sets, introducing a spectral variant as an efficient and robust by-product. CONCLUSIONS Our tree aggregation method can be used in conjunction with hierarchical clustering to perform efficient cluster analysis. This approach was found to be robust to the absence of clustering information in some of the data sets as well as an increased variability within true clusters. The method is implemented in R/C++ and available as an R package named mergeTrees, which makes it easy to integrate in existing or new pipelines in several research areas.

中文翻译:


用于共识层次聚类的快速树聚合。



背景技术在无监督学习和聚类中,来自不同来源和类型的数据集成是多个研究领域讨论的难题。例如,在组学分析中,过去十年已经开发了数十种聚类方法。当单一数据源发挥作用​​时,层次聚类 (HC) 非常流行,因为树结构具有高度可解释性,并且可以说比数据分区提供更多信息。然而,盲目地将 HC 应用于多个数据源会引发计算和解释问题。结果我们提出了 mergeTrees,一种聚合一组具有相同叶子的树以创建共识树的方法。在我们的共识树中,高度为 h 的簇包含高度为 h 的所有树位于同一簇中的个体。该方法是精确的,并被证明是[公式:见文本],n 是个体,q 是要聚合的树的数量。我们的实施在模拟方面非常有效,使我们能够一次处理许多大树。我们还依靠 mergeTrees 对两个真实组学数据集进行聚类分析,引入光谱变体作为高效且稳健的副产品。结论我们的树聚合方法可以与层次聚类结合使用来执行有效的聚类分析。人们发现,这种方法对于某些数据集中缺乏聚类信息以及真实聚类中变异性增加的情况具有鲁棒性。该方法在 R/C++ 中实现,并作为名为 mergeTrees 的 R 包提供,这使得它可以轻松集成到多个研究领域的现有或新管道中。
更新日期:2020-04-22
down
wechat
bug