当前位置: X-MOL 学术BBA Mol. Basis Dis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A machine learning framework to trace tumor tissue-of-origin of 13 types of cancer based on DNA somatic mutation.
Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease ( IF 6.2 ) Pub Date : 2020-08-07 , DOI: 10.1016/j.bbadis.2020.165916
Bingsheng He 1 , Chan Dai 2 , Jidong Lang 2 , Pingping Bing 1 , Geng Tian 2 , Bo Wang 2 , Jialiang Yang 3
Affiliation  

Carcinoma of unknown primary (CUP), defined as metastatic cancers with unknown cancer origin, occurs in 3‐5 per 100 cancer patients in the United States. Heterogeneity and metastasis of cancer brings great difficulties to the follow-up diagnosis and treatment for CUP. To find the tissue-of-origin (TOO) of the CUP, multiple methods have been raised. However, the accuracies for computed tomography (CT) and positron emission tomography (PET) to identify TOO were 20%–27% and 24%–40% respectively, which were not enough for determining targeted therapies. In this study, we provide a machine learning framework to trace tumor tissue origin by using gene length-normalized somatic mutation sequencing data. Somatic mutation data was downloaded from the Data Portal (Release 28) of the International Cancer Genome Consortium (ICGC), and 4909 samples for 13 cancers was used to identify primary site of cancers. Optimal results were obtained based on a 600-gene set by using the random forest algorithm with 10-fold cross-validation, and the average accuracy and F1-score were 0.8822 and 0.8886 respectively across 13 types of cancer. In conclusion, we provide an effective computational framework to infer cancer tissue-of-origin by combining DNA sequencing and machine learning techniques, which is promising in assisting clinical diagnosis of cancers.



中文翻译:

基于DNA体细胞突变追踪13种类型癌症的肿瘤起源的机器学习框架。

在美国,每100例癌症患者中,有3-5个发生原发性未知癌(CUP),其定义为起源不明的转移性癌症。癌症的异质性和转移给CUP的后续诊断和治疗带来很大困难。为了找到CUP的起源组织(TOO),已经提出了多种方法。但是,计算机断层扫描(CT)和正电子发射断层扫描(PET)识别TOO的准确度分别为20%–27%和24%–40%,不足以确定目标疗法。在这项研究中,我们提供了一个机器学习框架,可以通过使用基因长度归一化的体细胞突变测序数据来追踪肿瘤组织的起源。体细胞突变数据是从国际癌症基因组协会(ICGC)的数据门户网站(第28版)下载的,使用4909份针对13种癌症的样本来确定癌症的主要部位。通过使用具有10倍交叉验证的随机森林算法基于600个基因集获得了最佳结果,在13种癌症中,平均准确度和F1得分分别为0.8822和0.8886。总之,我们通过结合DNA测序和机器学习技术提供了一个有效的计算框架,以推断癌症的起源组织,这有望在协助癌症的临床诊断中发挥作用。

更新日期:2020-08-18
down
wechat
bug