当前位置: X-MOL 学术bioRxiv. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting copy number alterations in RNA-Seq using SuperFreq
bioRxiv - Bioinformatics Pub Date : 2020-06-01 , DOI: 10.1101/2020.05.31.126888
Christoffer Flensburg , Alicia Oshlack , Ian J. Majewski

Calling copy number alterations (CNAs) from RNA-Seq is challenging, because differences in gene expression mean that read depth across genes varies by several orders of magnitude and there is a paucity of informative single nucleotide polymorphisms (SNPs). We previously developed SuperFreq to analyse exome data of tumours by combining variant calling and copy number estimation in an integrated pipeline. Here we have used the SuperFreq framework for the analysis of RNA sequencing (RNA-Seq) data, which allows for the detection of absolute and allele sensitive CNAs. SuperFreq uses an error-propagation framework to combine and maximise the information available in the read depth and B-allele frequencies of SNPs (BAFs) to make CNA calls on RNA-seq data. We used data from The Cancer Genome Atlas (TCGA) to evaluate the CNA called from RNA-Seq with those generated from SNP-arrays. When ploidy estimates were consistent, we found excellent agreement with CNAs called from DNA of over 98% of the genome for acute myeloid leukaemia (TCGA-AML, n=116) and 87% for colorectal cancer (TCGA-CRC, n=377), which has a much higher CNA burden. As expected, the sensitivity of CNA calling from RNA-Seq was dependent on gene density. Nonetheless, using RNA-Seq SuperFreq detected 78% of CNA calls covering 100 or more genes with a precision of 94%. Recall dropped markedly for focal events, but this also depended on the signal intensity. For example, in the CRC cohort SuperFreq identified 100% (7/7) of cases with high-level amplification of ERBB2, where the copy number was typically >20, but identified only 6% (1/17) of cases with moderate amplification of IGF2, typically 4 or 5 copies over a smaller region (median 5 flanking genes for IGF2, compared to 20 for ERBB2). We were able to reproduce the relationship between mutational load and CNA profile in CRC using RNA-Seq alone. SuperFreq offers an integrated platform for identification of CNAs and point mutations from RNA-seq in cancer transcriptomes.

中文翻译:

使用SuperFreq检测RNA-Seq中的拷贝数变化

从RNA-Seq调用拷贝数改变(CNA)是具有挑战性的,因为基因表达的差异意味着基因间的读取深度相差几个数量级,并且缺乏信息丰富的单核苷酸多态性(SNP)。我们之前开发了SuperFreq,通过在集成的管道中结合变异调用和拷贝数估计来分析肿瘤的外显子组数据。在这里,我们已经使用SuperFreq框架来分析RNA测序(RNA-Seq)数据,从而可以检测绝对和等位基因敏感的CNA。SuperFreq使用错误传播框架来组合和最大化SNP(BAF)的读取深度和B等位基因频率中的可用信息,以对RNA序列数据进行CNA调用。我们使用了来自癌症基因组图谱(TCGA)的数据来评估从RNA-Seq调用的CNA与从SNP阵列生成的CNA。当倍性估计值一致时,我们发现与98%以上的急性髓性白血病(TCGA-AML,n = 116)和87%的结直肠癌(TCGA-CRC,n = 377)的基因组DNA所要求的CNA极佳一致性。 ,这会增加CNA负担。不出所料,从RNA-Seq调用CNA的敏感性取决于基因密度。但是,使用RNA-Seq SuperFreq可以检测到78%的CNA调用,覆盖100个或更多基因,精确度为94%。焦点事件的召回率明显下降,但这也取决于信号强度。例如,在CRC队列中,SuperFreq确定了100%(7/7)的ERBB2高水平扩增病例,其中拷贝数通常> 20,但仅发现6%(1/17)的IGF2中等扩增病例,通常在较小区域内复制4或5个拷贝(中位5个IGF2基因,而ERBB2为20个)。我们仅使用RNA-Seq就能够在CRC中重现突变负荷与CNA谱之间的关系。SuperFreq提供了一个集成平台,可从癌症转录组中识别CNA和RNA-seq中的点突变。
更新日期:2020-06-01
down
wechat
bug