Multi-cancer samples clustering via graph regularized low-rank representation method under sparse and symmetric constraints.,BMC Bioinformatics

当前位置： X-MOL 学术 › BMC Bioinform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-cancer samples clustering via graph regularized low-rank representation method under sparse and symmetric constraints.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2019-12-30 , DOI: 10.1186/s12859-019-3231-5
Juan Wang ₁ , Cong-Hai Lu ₁ , Jin-Xing Liu ₁ , Ling-Yun Dai ₁ , Xiang-Zhen Kong ₁

Affiliation

BACKGROUND Identifying different types of cancer based on gene expression data has become hotspot in bioinformatics research. Clustering cancer gene expression data from multiple cancers to their own class is a significance solution. However, the characteristics of high-dimensional and small samples of gene expression data and the noise of the data make data mining and research difficult. Although there are many effective and feasible methods to deal with this problem, the possibility remains that these methods are flawed. RESULTS In this paper, we propose the graph regularized low-rank representation under symmetric and sparse constraints (sgLRR) method in which we introduce graph regularization based on manifold learning and symmetric sparse constraints into the traditional low-rank representation (LRR). For the sgLRR method, by means of symmetric constraint and sparse constraint, the effect of raw data noise on low-rank representation is alleviated. Further, sgLRR method preserves the important intrinsic local geometrical structures of the raw data by introducing graph regularization. We apply this method to cluster multi-cancer samples based on gene expression data, which improves the clustering quality. First, the gene expression data are decomposed by sgLRR method. And, a lowest rank representation matrix is obtained, which is symmetric and sparse. Then, an affinity matrix is constructed to perform the multi-cancer sample clustering by using a spectral clustering algorithm, i.e., normalized cuts (Ncuts). Finally, the multi-cancer samples clustering is completed. CONCLUSIONS A series of comparative experiments demonstrate that the sgLRR method based on low rank representation has a great advantage and remarkable performance in the clustering of multi-cancer samples.

中文翻译：

在稀疏和对称约束下，通过图正则化低秩表示方法对多癌样本进行聚类。

背景技术基于基因表达数据识别不同类型的癌症已成为生物信息学研究的热点。将来自多种癌症的癌症基因表达数据聚类到其自身类别是一个重要的解决方案。然而，基因表达数据的高维和小样本的特征以及数据的噪声使得数据挖掘和研究变得困难。尽管有许多有效和可行的方法来解决此问题，但这些方法仍然存在缺陷的可能性。结果在本文中，我们提出了基于对称和稀疏约束（sgLRR）方法的图正则化低秩表示，其中将基于流形学习和对称稀疏约束的图正则化引入传统的低秩表示（LRR）中。对于sgLRR方法，通过对称约束和稀疏约束，减轻了原始数据噪声对低秩表示的影响。此外，sgLRR方法通过引入图正则化来保留原始数据的重要内在局部几何结构。我们将这种方法应用于基于基因表达数据的多癌样本的聚类，从而提高了聚类的质量。首先，通过sgLRR方法分解基因表达数据。并且，获得了对称且稀疏的最低秩表示矩阵。然后，构造亲和矩阵以通过使用频谱聚类算法（即归一化割（Ncuts））来执行多癌症样本聚类。最终，完成了多癌症样本的聚类。

更新日期：2019-12-30

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11