当前位置: X-MOL 学术Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Minimum error calibration and normalization for genomic copy number analysis.
Genomics ( IF 3.4 ) Pub Date : 2020-05-13 , DOI: 10.1016/j.ygeno.2020.05.008
Bo Gao 1 , Michael Baudis 1
Affiliation  

Background

Copy number variations (CNV) are regional deviations from the normal autosomal bi-allelic DNA content. While germline CNVs are a major contributor to genomic syndromes and inherited diseases, the majority of cancers accumulate extensive “somatic” CNV (sCNV or CNA) during the process of oncogenetic transformation and progression. While specific sCNV have closely been associated with tumorigenesis, intriguingly many neoplasias exhibit recurrent sCNV patterns beyond the involvement of a few cancer driver genes. Currently, CNV profiles of tumor samples are generated using genomic micro-arrays or high-throughput DNA sequencing. Regardless of the underlying technology, genomic copy number data is derived from the relative assessment and integration of multiple signals, with the data generation process being prone to contamination from several sources. Estimated copy number values have no absolute or strictly linear correlation to their corresponding DNA levels, and the extent of deviation differs between sample profiles, which poses a great challenge for data integration and comparison in large scale genome analysis.

Results

In this study, we present a novel method named “Minimum Error Calibration and Normalization for Copy Numbers Analysis” (Mecan4CNA). It only requires CNV segmentation files as input, is platform independent, and has a high performance with limited hardware requirements. For a given multi-sample copy number dataset, Mecan4CNA can batch-normalize all samples to the corresponding true copy number levels of the main tumor clones. Experiments of Mecan4CNA on simulated data showed an overall accuracy of 93% and 91% in determining the normal level and single copy alteration (i.e. duplication or loss of one allele), respectively. Comparison of estimated normal levels and single copy alternations with existing methods and karyotyping data on the NCI-60 tumor cell line produced coherent results. To estimate the method's impact on downstream analyses, we performed GISTIC analyses on the original and Mecan4CNA normalized data from the Cancer Genome Atlas (TCGA) where the normalized data showed prominent improvements of both sensitivity and specificity in detecting focal regions.

Conclusions

Mecan4CNA provides an advanced method for CNA data normalization, especially in meta-analyses involving large profile numbers and heterogeneous source data quality. With its informative output and visualization options, Mecan4CNA also can improve the interpretation of individual CNA profiles. Mecan4CNA is freely available as a Python package and through its code repository on Github.



中文翻译:

基因组拷贝数分析的最小误差校准和标准化。

背景

拷贝数变异 (CNV) 是与正常常染色体双等位基因 DNA 含量的区域偏差。虽然生殖系 CNV 是基因组综合征和遗传性疾病的主要贡献者,但大多数癌症在致癌转化和进展过程中会积累大量的“体细胞”CNV(sCNV 或 CNA)。虽然特定的 sCNV 与肿瘤发生密切相关,但有趣的是,除了少数癌症驱动基因的参与之外,许多肿瘤表现出复发性 sCNV 模式。目前,肿瘤样本的 CNV 谱是使用基因组微阵列或高通量 DNA 测序生成的。无论底层技术如何,基因组拷贝数数据都是从多个信号的相对评估和整合中得出的,数据生成过程容易受到多个来源的污染。

结果

在这项研究中,我们提出了一种名为“拷贝数分析的最小误差校准和归一化”(Mecan4CNA)的新方法。它只需要 CNV 分割文件作为输入,与平台无关,并且在硬件要求有限的情况下具有高性能。对于给定的多样本拷贝数数据集,Mecan4CNA可以将所有样本批量归一化为主要肿瘤克隆的相应真实拷贝数水平。Mecan4CNA的实验模拟数据显示在确定正常水平和单拷贝改变(即一个等位基因的重复或丢失)方面的总体准确度分别为 93% 和 91%。将估计的正常水平和单拷贝交替与现有方法和 NCI-60 肿瘤细胞系的核型分析数据进行比较,得出了一致的结果。为了估计该方法对下游分析的影响,我们对来自癌症基因组图谱 (TCGA)的原始数据和Mecan4CNA归一化数据进行了 GISTIC 分析,其中归一化数据显示检测焦点区域的灵敏度和特异性都有显着提高。

结论

Mecan4CNA为 CNA 数据规范化提供了一种先进的方法,特别是在涉及大量配置文件和异构源数据质量的元分析中。凭借其信息输出和可视化选项,Mecan4CNA 还可以改进对单个 CNA 配置文件的解释。Mecan4CNA可作为 Python 包免费使用,并可通过其在 Github 上的代码库免费获得。

更新日期:2020-06-23
down
wechat
bug