当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Row and Column Structure-Based Biclustering for Gene Expression Data
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2020-09-07 , DOI: 10.1109/tcbb.2020.3022085
Subin Qian 1 , Huiyi Liu 1 , Xiaofeng Yuan 2 , Wei Wei 3 , Shuangshuang Chen 2 , Hong Yan 4
Affiliation  

Due to the development of high-throughput technologies for gene analysis, the biclustering method has attracted much attention. However, existing methods have problems with high time and space complexity. This paper proposes a biclustering method, called Row and Column Structure-based Biclustering (RCSBC), with low time and space complexity to find checkerboard patterns within microarray data. First, the paper describes the structure of bicluster by using the structure of rows and columns. Second, the paper chooses the representative rows and columns with two algorithms. Finally, the gene expression data are biclustered on the space spanned by representative rows and columns. To the best of our knowledge, this paper is the first to exploit the relationship between the row/column structure of a gene expression matrix and the structure of biclusters. Both the synthetic datasets and the real-life gene expression datasets are used to validate the effectiveness of our method. It can be seen from the experiment results that the RCSBC outperforms the state-of-the-art algorithms both on clustering accuracy and time/space complexity. This study offers new insights into biclustering the large-scale gene expression data without loading the whole data into memory.

中文翻译:


基于行和列结构的基因表达数据双聚类



由于基因分析高通量技术的发展,双聚类方法备受关注。然而,现有方法存在时间复杂度和空间复杂度高的问题。本文提出了一种双聚类方法,称为基于行和列结构的双聚类(RCSBC),该方法具有较低的时间和空间复杂度,可以在微阵列数据中查找棋盘图案。论文首先利用行和列的结构描述了双簇的结构。其次,本文用两种算法选择了具有代表性的行和列。最后,基因表达数据在代表性行和列跨越的空间上进行双聚类。据我们所知,本文首次探讨了基因表达矩阵的行/列结构与双簇结构之间的关系。合成数据集和现实生活中的基因表达数据集都用于验证我们方法的有效性。从实验结果可以看出,RCSBC 在聚类精度和时间/空间复杂度上都优于最先进的算法。这项研究为大规模基因表达数据的双聚类提供了新的见解,而无需将整个数据加载到内存中。
更新日期:2020-09-07
down
wechat
bug