当前位置: X-MOL 学术Comput. Intell. Neurosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches
Computational Intelligence and Neuroscience Pub Date : 2020-10-29 , DOI: 10.1155/2020/4737969
Zhezhou Yu 1 , Zhuo Wang 1 , Xiangchun Yu 1, 2 , Zhe Zhang 1
Affiliation  

Background. Breast invasive carcinoma (BRCA) is not a single disease as each subtype has a distinct morphology structure. Although several computational methods have been proposed to conduct breast cancer subtype identification, the specific interaction mechanisms of genes involved in the subtypes are still incomplete. To identify and explore the corresponding interaction mechanisms of genes for each subtype of breast cancer can impose an important impact on the personalized treatment for different patients. Methods. We integrate the biological importance of genes from the gene regulatory networks to the differential expression analysis and then obtain the weighted differentially expressed genes (weighted DEGs). A gene with a high weight means it regulates more target genes and thus holds more biological importance. Besides, we constructed gene coexpression networks for control and experiment groups, and the significantly differentially interacting structures encouraged us to design the corresponding Gene Ontology (GO) enrichment based on gene coexpression networks (GOEGCN). The GOEGCN considers the two-side distinction analysis between gene coexpression networks for control and experiment groups. The method allows us to study how the modulated coexpressed gene couples impact biological functions at a GO level. Results. We modeled the binary classification with weighted DEGs for each subtype. The binary classifier could make a good prediction for an unseen sample, and the experimental results validated the effectiveness of our proposed approaches. The novel enriched GO terms based on GOEGCN for control and experiment groups of each subtype explain the specific biological function changes according to the two-side distinction of coexpression network structures to some extent. Conclusion. The weighted DEGs contain biological importance derived from the gene regulatory network. Based on the weighted DEGs, five binary classifiers were learned and showed good performance concerning the “Sensitivity,” “Specificity,” “Accuracy,” “F1,” and “AUC” metrics. The GOEGCN with weighted DEGs for control and experiment groups presented a novel GO enrichment analysis results and the novel enriched GO terms would further unveil the changes of specific biological functions among all the BRCA subtypes to some extent. The R code in this research is available at https://github.com/yxchspring/GOEGCN_BRCA_Subtypes.

中文翻译:

使用机器学习方法进行基于 RNA-Seq 的乳腺癌亚型分类

背景。乳腺浸润性癌 (BRCA) 不是一种单一疾病,因为每种亚型都有不同的形态结构。尽管已经提出了几种计算方法来进行乳腺癌亚型鉴定,但涉及亚型的基因的具体相互作用机制仍然不完整。识别和探索各亚型乳腺癌基因的相应相互作用机制,可以对不同患者的个体化治疗产生重要影响。方法. 我们将基因调控网络中基因的生物学重要性整合到差异表达分析中,然后获得加权的差异表达基因(加权DEGs)。具有高权重的基因意味着它调节更多的靶基因,因此具有更多的生物学重要性。此外,我们为对照组和实验组构建了基因共表达网络,显着差异相互作用的结构鼓励我们基于基因共表达网络(GOEGCN)设计相应的基因本体(GO)富集。GOEGCN 考虑了对照组和实验组基因共表达网络之间的两侧区别分析。该方法使我们能够研究调制的共表达基因对如何在 GO 水平上影响生物功能。结果. 我们使用每个子类型的加权 DEG 对二进制分类进行建模。二元分类器可以对看不见的样本做出很好的预测,实验结果验证了我们提出的方法的有效性。各亚型对照组和实验组基于GOEGCN的新型丰富GO术语,在一定程度上解释了根据共表达网络结构两侧区分的具体生物学功能变化。结论。加权 DEG 包含源自基因调控网络的生物学重要性。基于加权的 DEG,学习了五个二元分类器,并在“灵敏度”、“特异性”、“准确度”、“ F1”和“AUC”指标。对照组和实验组加权DEGs的GOEGCN呈现出新的GO富集分析结果,新富集的GO术语将在一定程度上进一步揭示所有BRCA亚型之间特定生物学功能的变化。本研究中的 R 代码可在 https://github.com/yxchspring/GOEGCN_BRCA_Subtypes 获得。
更新日期:2020-10-30
down
wechat
bug