当前位置: X-MOL 学术Biometrics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using the “Hidden” genome to improve classification of cancer types
Biometrics ( IF 1.9 ) Pub Date : 2020-09-11 , DOI: 10.1111/biom.13367
Saptarshi Chakraborty 1 , Colin B Begg 1 , Ronglai Shen 1
Affiliation  

It is increasingly common clinically for cancer specimens to be examined using techniques that identify somatic mutations. In principle, these mutational profiles can be used to diagnose the tissue of origin, a critical task for the 3% to 5% of tumors that have an unknown primary site. Diagnosis of primary site is also critical for screening tests that employ circulating DNA. However, most mutations observed in any new tumor are very rarely occurring mutations, and indeed the preponderance of these may never have been observed in any previous recorded tumor. To create a viable diagnostic tool we need to harness the information content in this “hidden genome” of variants for which no direct information is available. To accomplish this we propose a multilevel meta-feature regression to extract the critical information from rare variants in the training data in a way that permits us to also extract diagnostic information from any previously unobserved variants in the new tumor sample. A scalable implementation of the model is obtained by combining a high-dimensional feature screening approach with a group-lasso penalized maximum likelihood approach based on an equivalent mixed-effect representation of the multilevel model. We apply the method to the Cancer Genome Atlas whole-exome sequencing data set including 3702 tumor samples across seven common cancer sites. Results show that our multilevel approach can harness substantial diagnostic information from the hidden genome.

中文翻译:

使用“隐藏”基因组改进癌症类型的分类

使用识别体细胞突变的技术检查癌症标本在临床上越来越普遍。原则上,这些突变谱可用于诊断起源组织,这对于 3% 至 5% 的原发部位未知的肿瘤来说是一项关键任务。原发部位的诊断对于使用循环 DNA 的筛选试验也很重要。然而,在任何新肿瘤中观察到的大多数突变都是非常罕见的突变,事实上,在任何以前记录的肿瘤中可能从未观察到过这些突变的优势。为了创建一个可行的诊断工具,我们需要利用没有直接信息可用的变体的这个“隐藏基因组”中的信息内容。为实现这一目标,我们提出了一种多级元特征回归,以从训练数据中的罕见变异中提取关键信息,从而使我们也可以从新肿瘤样本中任何以前未观察到的变异中提取诊断信息。通过将高维特征筛选方法与基于多级模型的等效混合效应表示的组套索惩罚最大似然法相结合,可以获得该模型的可扩展实现。我们将该方法应用于癌症基因组图谱全外显子组测序数据集,包括七个常见癌症部位的 3702 个肿瘤样本。结果表明,我们的多级方法可以利用来自隐藏基因组的大量诊断信息。
更新日期:2020-09-11
down
wechat
bug