当前位置: X-MOL 学术Cancer Gene Ther. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of leukemia stem cell expression signatures through Monte Carlo feature selection strategy and support vector machine.
Cancer Gene Therapy ( IF 6.4 ) Pub Date : 2019-05-29 , DOI: 10.1038/s41417-019-0105-y
JiaRui Li 1, 2 , Lin Lu 3 , Yu-Hang Zhang 1 , YaoChen Xu 4 , Min Liu 5 , KaiYan Feng 6 , Lei Chen 5, 7 , XiangYin Kong 1 , Tao Huang 1 , Yu-Dong Cai 2
Affiliation  

Acute myeloid leukemia (AML) is a type of blood cancer characterized by the rapid growth of immature white blood cells from the bone marrow. Therapy resistance resulting from the persistence of leukemia stem cells (LSCs) are found in numerous patients. Comparative transcriptome studies have been previously conducted to analyze differentially expressed genes between LSC+ and LSC- cells. However, these studies mainly focused on a limited number of genes with the most obvious expression differences between the two cell types. We developed a computational approach incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), support vector machine (SVM), Repeated Incremental Pruning to Produce Error Reduction (RIPPER), to identify gene expression features specific to LSCs. One thousand 0ne hudred fifty-nine features (genes) were first identified, which can be used to build the optimal SVM classifier for distinguishing LSC+ and LSC- cells. Among these 1159 genes, the top 17 genes were identified as LSC-specific biomarkers. In addition, six classification rules were produced by RIPPER algorithm. The subsequent literature review on these features/genes and the classification rules and functional enrichment analyses of the 1159 features/genes confirmed the relevance of extracted genes and rules to the characteristics of LSCs.

中文翻译:

通过蒙特卡罗特征选择策略和支持向量机识别白血病干细胞表达特征。

急性髓性白血病 (AML) 是一种血癌,其特征是骨髓中未成熟白细胞的快速生长。在许多患者中发现了由白血病干细胞 (LSC) 的持续存在导致的治疗抗性。之前已经进行了比较转录组研究来分析 LSC+ 和 LSC- 细胞之间的差异表达基因。然而,这些研究主要集中在有限数量的基因上,这两种细胞类型之间的表达差异最为明显。我们开发了一种计算方法,结合了多种机器学习算法,包括蒙特卡罗特征选择 (MCFS)、增量特征选择 (IFS)、支持向量机 (SVM)、重复增量剪枝以减少错误 (RIPPER),以识别基因表达特征特定于 LSC。首次识别出10000059个特征(基因),可用于构建区分LSC+和LSC-细胞的最优SVM分类器。在这 1159 个基因中,前 17 个基因被鉴定为 LSC 特异性生物标志物。此外,RIPPER算法产生了6个分类规则。随后对这些特征/基因的文献回顾以及1159个特征/基因的分类规则和功能富集分析证实了提取的基因和规则与LSC特征的相关性。RIPPER算法产生了6条分类规则。随后对这些特征/基因的文献回顾以及1159个特征/基因的分类规则和功能富集分析证实了提取的基因和规则与LSC特征的相关性。RIPPER算法产生了6条分类规则。随后对这些特征/基因的文献回顾以及1159个特征/基因的分类规则和功能富集分析证实了提取的基因和规则与LSC特征的相关性。
更新日期:2019-11-18
down
wechat
bug