当前位置: X-MOL 学术IEEE/ACM Trans. Comput. Biol. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Unsupervised Feature Selection Using an Integrated Strategy of Hierarchical Clustering With Singular Value Decomposition: An Integrative Biomarker Discovery Method With Application to Acute Myeloid Leukemia
IEEE/ACM Transactions on Computational Biology and Bioinformatics ( IF 3.6 ) Pub Date : 2021-09-08 , DOI: 10.1109/tcbb.2021.3110989
Tapas Bhadra 1 , Saurav Mallik 2 , Amir Sohel 1 , Zhongming Zhao 3
Affiliation  

In this article, we propose a novel unsupervised feature selection method by combining hierarchical feature clustering with singular value decomposition (SVD). The proposed algorithm first generates several feature clusters by adopting the hierarchical clustering on the feature space and then applies SVD to each of these feature clusters to find out the feature that contributes most to the SVD-entropy. The proposed feature selection method selects an optimal feature subset that not only minimizes the mutual dependency among the selected features but also maximizes the mutual dependency of the selected features against their nearest neighbor non-selected features to some extent. Each of the selected features also contributes the maximum SVD-entropy among all features of the same feature cluster. The experimental results demonstrate that the proposed algorithm performs well against several state-of-the-art methods of feature selection in terms of various evaluation criteria such as classification accuracy, redundancy rate, and representation entropy. The superiority of the proposed algorithm is demonstrated through analysis of Acute Myeloid Leukemia (AML) multi-omics data that consist of five datasets: gene expression, exon expression, methylation, microRNA, and pathway activity dataset (paradigm IPLs) from The Cancer Genome Atlas (TCGA). Our analysis pinpoints a candidate gene-marker, EREG for AML with an integrative omics evidence. EREG is targeted by two top ranked microRNAs, hsa-miR-1286 and hsa-miR-1976, here in the datasets. The method and results will be useful for biomarker discovery in the era of in precision medicine.

中文翻译:


使用分层聚类和奇异值分解的综合策略进行无监督特征选择:一种应用于急性髓系白血病的综合生物标志物发现方法



在本文中,我们通过将层次特征聚类与奇异值分解(SVD)相结合,提出了一种新颖的无监督特征选择方法。该算法首先在特征空间上采用层次聚类生成多个特征簇,然后对每个特征簇应用 SVD,找出对 SVD 熵贡献最大的特征。所提出的特征选择方法选择一个最优特征子集,该子集不仅最小化所选特征之间的相互依赖性,而且在一定程度上最大化所选特征与其最近邻未选择特征的相互依赖性。每个选定的特征还贡献同一特征簇的所有特征中的最大 SVD 熵。实验结果表明,该算法在分类精度、冗余率和表示熵等各种评估标准方面比几种最先进的特征选择方法表现良好。通过对急性髓性白血病 (AML) 多组学数据的分析证明了所提出算法的优越性,这些数据由五个数据集组成:基因表达、外显子表达、甲基化、microRNA 和来自癌症基因组图谱的通路活动数据集(范式 IPL) (TCGA)。我们的分析通过综合组学证据确定了 AML 的候选基因标记 EREG。 EREG 是数据集中两个排名最高的 microRNA(hsa-miR-1286 和 hsa-miR-1976)的目标。该方法和结果将有助于精准医学时代生物标志物的发现。
更新日期:2021-09-08
down
wechat
bug