An evaluation of a novel approach for clustering genes with dissimilar replicates,Communications in Statistics - Simulation and Computation

当前位置： X-MOL 学术 › Commun. Stat. Simul. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An evaluation of a novel approach for clustering genes with dissimilar replicates
Communications in Statistics - Simulation and Computation ( IF 0.8 ) Pub Date : 2020-12-08
Ozan Cinar, Cem Iyigun, Ozlem Ilk

Abstract

Clustering the genes is a step in microarray studies which demands several considerations. First, the expression levels can be collected as time-series which should be accounted for appropriately. Furthermore, genes may behave differently in different biological replicates due to their genetic backgrounds. Highlighting such genes may deepen the study; however, it introduces further complexities for clustering. The third concern stems from the existence of a large amount of constant genes which demands a heavy computational burden. Finally, the number of clusters is not known in advance; therefore, a clustering algorithm should be able to recommend meaningful number of clusters. In this study, we evaluate a recently proposed clustering algorithm that promises to address these issues with a simulation study. The methodology accepts each gene as a combination of its replications and accounts for the time dependency. Furthermore, it computes cluster validation scores to suggest possible numbers of clusters. Results show that the methodology is able to find the clusters and highlight the genes with differences among the replications, separate the constant genes to reduce the computational burden, and suggest meaningful number of clusters. Furthermore, our results show that traditional distance metrics are not efficient in clustering the short time-series correctly.

中文翻译：

对新方法聚类不同重复基因的新方法的评估

摘要

基因的聚类是微阵列研究的一个步骤，需要进行一些考虑。首先，表达水平可以作为时间序列收集，应适当考虑。此外，由于基因背景的不同，基因在不同的生物学复制品中的行为也可能不同。突出这些基因可能会加深研究。但是，它为聚类带来了更多的复杂性。第三个问题源于大量恒定基因的存在，这需要沉重的计算负担。最后，簇的数目是未知的；因此，聚类算法应该能够推荐有意义数量的聚类。在这项研究中，我们评估了最近提出的聚类算法，该算法有望通过仿真研究解决这些问题。该方法接受每个基因作为其复制的组合，并说明了时间依赖性。此外，它计算聚类验证分数以建议可能的聚类数量。结果表明，该方法能够找到簇并突出显示重复之间具有差异的基因，分离恒定基因以减少计算负担，并提出有意义的簇数。此外，我们的结果表明，传统的距离度量无法有效地正确地对短时间序列进行聚类。分离恒定基因以减少计算负担，并提出有意义的簇数。此外，我们的结果表明，传统的距离度量无法有效地正确地对短时间序列进行聚类。分离恒定基因以减少计算负担，并提出有意义的簇数。此外，我们的结果表明，传统的距离度量无法有效地正确地对短时间序列进行聚类。

更新日期：2020-12-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文