当前位置: X-MOL 学术Brief. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
FS-GBDT: identification multicancer-risk module via a feature selection algorithm by integrating Fisher score and GBDT.
Briefings in Bioinformatics ( IF 6.8 ) Pub Date : 2020-09-07 , DOI: 10.1093/bib/bbaa189
Jialin Zhang 1 , Da Xu 1 , Kaijing Hao 1 , Yusen Zhang 2 , Wei Chen 1 , Jiaguo Liu 1 , Rui Gao 3 , Chuanyan Wu 4 , Yang De Marinis 5
Affiliation  

Cancer is a highly heterogeneous disease caused by dysregulation in different cell types and tissues. However, different cancers may share common mechanisms. It is critical to identify decisive genes involved in the development and progression of cancer, and joint analysis of multiple cancers may help to discover overlapping mechanisms among different cancers. In this study, we proposed a fusion feature selection framework attributed to ensemble method named Fisher score and Gradient Boosting Decision Tree (FS–GBDT) to select robust and decisive feature genes in high-dimensional gene expression datasets. Joint analysis of 11 human cancers types was conducted to explore the key feature genes subset of cancer. To verify the efficacy of FS–GBDT, we compared it with four other common feature selection algorithms by Support Vector Machine (SVM) classifier. The algorithm achieved highest indicators, outperforms other four methods. In addition, we performed gene ontology analysis and literature validation of the key gene subset, and this subset were classified into several functional modules. Functional modules can be used as markers of disease to replace single gene which is difficult to be found repeatedly in applications of gene chip, and to study the core mechanisms of cancer.

中文翻译:

FS-GBDT:通过结合 Fisher 评分和 GBDT 的特征选择算法识别多癌风险模块。

癌症是由不同细胞类型和组织失调引起的高度异质性疾病。然而,不同的癌症可能具有共同的机制。确定参与癌症发生和进展的决定性基因至关重要,多种癌症的联合分析可能有助于发现不同癌症之间的重叠机制。在这项研究中,我们提出了一种融合特征选择框架,该框架归因于名为 Fisher score 和 Gradient Boosting Decision Tree (FS-GBDT) 的集成方法,以在高维基因表达数据集中选择稳健和决定性的特征基因。对 11 种人类癌症类型进行联合分析,以探索癌症的关键特征基因子集。为了验证 FS-GBDT 的功效,我们通过支持向量机 (SVM) 分类器将其与其他四种常见的特征选择算法进行了比较。该算法达到了最高指标,优于其他四种方法。此外,我们对关键基因子集进行了基因本体分析和文献验证,并将该子集分为几个功能模块。功能模块可作为疾病的标志物,替代基因芯片应用中难以重复发现的单个基因,研究癌症的核心机制。
更新日期:2020-09-08
down
wechat
bug