当前位置: X-MOL 学术Front. Phys. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Classification of Cancer Types Using Graph Convolutional Neural Networks
Frontiers in Physics ( IF 1.9 ) Pub Date : 2020-05-08 , DOI: 10.3389/fphy.2020.00203
Ricardo Ramirez 1 , Yu-Chiao Chiu 2 , Allen Hererra 1 , Milad Mostavi 1 , Joshua Ramirez 1 , Yidong Chen 2, 3 , Yufei Huang 1, 3 , Yu-Fang Jin 1
Affiliation  

Background: Cancer has been a leading cause of death in the United States with significant health care costs. Accurate prediction of cancers at an early stage and understanding the genomic mechanisms that drive cancer development are vital to the improvement of treatment outcomes and survival rates, thus resulting in significant social and economic impacts. Attempts have been made to classify cancer types with machine learning techniques during the past two decades and deep learning approaches more recently.

Results: In this paper, we established four models with graph convolutional neural network (GCNN) that use unstructured gene expressions as inputs to classify different tumor and non-tumor samples into their designated 33 cancer types or as normal. Four GCNN models based on a co-expression graph, co-expression+singleton graph, protein-protein interaction (PPI) graph, and PPI+singleton graph have been designed and implemented. They were trained and tested on combined 10,340 cancer samples and 731 normal tissue samples from The Cancer Genome Atlas (TCGA) dataset. The established GCNN models achieved excellent prediction accuracies (89.9–94.7%) among 34 classes (33 cancer types and a normal group). In silico gene-perturbation experiments were performed on four models based on co-expression graph, co-expression+singleton, PPI graph, and PPI+singleton graphs. The co-expression GCNN model was further interpreted to identify a total of 428 marker genes that drive the classification of 33 cancer types and normal. The concordance of differential expressions of these markers between the represented cancer type and others are confirmed. Successful classification of cancer types and a normal group regardless of normal tissues' origin suggested that the identified markers are cancer-specific rather than tissue-specific.

Conclusion: Novel GCNN models have been established to predict cancer types or normal tissue based on gene expression profiles. We demonstrated the results from the TCGA dataset that these models can produce accurate classification (above 94%), using cancer-specific markers genes. The models and the source codes are publicly available and can be readily adapted to the diagnosis of cancer and other diseases by the data-driven modeling research community.



中文翻译:

使用图卷积神经网络对癌症类型进行分类

背景:癌症一直是美国的主要死亡原因,医疗费用高昂。早期准确预测癌症并了解驱动癌症发展的基因组机制对于改善治疗结果和生存率至关重要,从而产生重大的社会和经济影响。在过去的二十年中,人们尝试使用机器学习技术和最近的深度学习方法对癌症类型进行分类。

结果:在本文中,我们使用图卷积神经网络 (GCNN) 建立了四个模型,使用非结构化基因表达作为输入,将不同的肿瘤和非肿瘤样本分类为指定的 33 种癌症类型或正常类型。设计并实现了四种基于共表达图、共表达+单例图、蛋白质-蛋白质相互作用(PPI)图和PPI+单例图的GCNN模型。他们对来自癌症基因组图谱 (TCGA) 数据集的 10,340 个癌症样本和 731 个正常组织样本进行了训练和测试。建立的 GCNN 模型在 34 个类别(33 种癌症类型和正常组)中取得了出色的预测精度(89.9-94.7%)。计算机模拟基于共表达图、共表达+单例、PPI图和PPI+单例图四种模型进行基因扰动实验。共表达 GCNN 模型经过进一步解释,确定了总共 428 个标记基因,可驱动 33 种癌症类型和正常癌症的分类。这些标记物在所代表的癌症类型和其他癌症类型之间的差异表达的一致性得到了证实。无论正常组织的起源如何,对癌症类型和正常组的成功分类表明,所识别的标记物是癌症特异性的,而不是组织特异性的。

结论:已经建立了新的 GCNN 模型来根据基因表达谱预测癌症类型或正常组织。我们证明了 TCGA 数据集的结果,这些模型可以使用癌症特异性标记基因产生准确的分类(94% 以上)。这些模型和源代码是公开的,并且可以很容易地适应数据驱动建模研究社区对癌症和其他疾病的诊断。

更新日期:2020-05-08
down
wechat
bug