当前位置: X-MOL 学术J. Bioinform. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ClusterMine: A knowledge-integrated clustering approach based on expression profiles of gene sets
Journal of Bioinformatics and Computational Biology ( IF 1 ) Pub Date : 2020-07-23 , DOI: 10.1142/s0219720020400090
Hong-Dong Li 1 , Yunpei Xu 1 , Xiaoshu Zhu 1, 2 , Quan Liu 1 , Gilbert S Omenn 3 , Jianxin Wang 1
Affiliation  

Clustering analysis of gene expression data is essential for understanding complex biological data, and is widely used in important biological applications such as the identification of cell subpopulations and disease subtypes. In commonly used methods such as hierarchical clustering (HC) and consensus clustering (CC), holistic expression profiles of all genes are often used to assess the similarity between samples for clustering. While these methods have been proven successful in identifying sample clusters in many areas, they do not provide information about which gene sets (functions) contribute most to the clustering, thus limiting the interpretability of the resulting cluster. We hypothesize that integrating prior knowledge of annotated gene sets would not only achieve satisfactory clustering performance but also, more importantly, enable potential biological interpretation of clusters. Here we report ClusterMine, an approach that identifies clusters by assessing functional similarity between samples through integrating known annotated gene sets in functional annotation databases such as Gene Ontology. In addition to the cluster membership of each sample as provided by conventional approaches, it also outputs gene sets that most likely contribute to the clustering, thus facilitating biological interpretation. We compare ClusterMine with conventional approaches on nine real-world experimental datasets that represent different application scenarios in biology. We find that ClusterMine achieves better performances and that the gene sets prioritized by our method are biologically meaningful. ClusterMine is implemented as an R package and is freely available at: www.genemine.org/clustermine.php

中文翻译:

ClusterMine:一种基于基因集表达谱的知识集成聚类方法

基因表达数据的聚类分析对于理解复杂的生物学数据至关重要,并广泛用于重要的生物学应用,例如细胞亚群和疾病亚型的鉴定。在常用的方法如层次聚类 (HC) 和共识聚类 (CC) 中,通常使用所有基因的整体表达谱来评估样本之间的相似性以进行聚类。虽然这些方法已被证明在许多领域识别样本集群方面是成功的,但它们没有提供有关哪些基因集(功能)对集群贡献最大的信息,从而限制了所得集群的可解释性。我们假设整合注释基因集的先验知识不仅可以实现令人满意的聚类性能,而且更重要的是,使集群的潜在生物学解释成为可能。在这里,我们报告了 ClusterMine,这是一种通过在功能注释数据库(如 Gene Ontology)中集成已知注释基因集来评估样本之间的功能相似性来识别集群的方法。除了传统方法提供的每个样本的聚类成员之外,它还输出最有可能有助于聚类的基因集,从而促进生物学解释。我们在代表生物学中不同应用场景的九个真实世界实验数据集上将 ClusterMine 与传统方法进行了比较。我们发现 ClusterMine 取得了更好的性能,并且我们的方法优先考虑的基因集具有生物学意义。ClusterMine 以 R 包的形式实现,可在以下网址免费获得:www.genemine。
更新日期:2020-07-23
down
wechat
bug