当前位置: X-MOL 学术Genet. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Detecting rare copy number variants from Illumina genotyping arrays with the CamCNV pipeline: Segmentation of z‐scores improves detection and reliability
Genetic Epidemiology ( IF 1.7 ) Pub Date : 2020-10-05 , DOI: 10.1002/gepi.22367
Joe Dennis 1 , Logan Walker 2 , Jonathan Tyrer 3 , Kyriaki Michailidou 1, 4, 5 , Douglas F Easton 1
Affiliation  

The intensities from genotyping array data can be used to detect copy number variants (CNVs) but a high level of noise in the data and overlap between different copy‐number intensity distributions produces unreliable calls, particularly when only a few probes are covered by the CNV. We present a novel pipeline (CamCNV) with a series of steps to reduce noise and detect more reliably CNVs covering as few as three probes. The pipeline aims to detect rare CNVs (below 1% frequency) for association tests in large cohorts. The method uses the information from all samples to convert intensities to z‐scores, thus adjusting for variance between probes. We tested the sensitivity of our pipeline by looking for known CNVs from the 1000 Genomes Project in our genotyping of 1000 Genomes samples. We also compared the CNV calls for 1661 pairs of genotyped replicate samples. At the chosen mean z‐score cut‐off, sensitivity to detect the 1000 Genomes CNVs was approximately 85% for deletions and 65% for duplications. From the replicates, we estimate the false discovery rate is controlled at ∼10% for deletions (falling to below 3% with more than five probes) and ∼28% for duplications. The pipeline demonstrates improved sensitivity when compared to calling with PennCNV, particularly for short deletions covering only a few probes. For each called CNV, the mean z‐score is a useful metric for controlling the false discovery rate.

中文翻译:


使用 CamCNV 管道检测 Illumina 基因分型阵列中的罕见拷贝数变异:z 分数分割可提高检测和可靠性



基因分型阵列数据的强度可用于检测拷贝数变异 (CNV),但数据中的高水平噪声以及不同拷贝数强度分布之间的重叠会产生不可靠的检测,特别是当 CNV 只覆盖少数探针时。我们提出了一种新颖的流程 (CamCNV),其中包含一系列步骤来减少噪声并检测更可靠的 CNV,覆盖少至三个探针。该管道旨在检测罕见的 CNV(频率低于 1%),用于大型群体的关联测试。该方法使用所有样本的信息将强度转换为z分数,从而调整探针之间的方差。我们通过在 1000 个基因组样本的基因分型中寻找来自 1000 个基因组计划的已知 CNV 来测试我们的管道的敏感性。我们还比较了 1661 对基因分型重复样本的 CNV 调用。在选定的平均z分数截止值下,检测 1000 个基因组 CNV 的灵敏度对于缺失约为 85%,对于重复约为 65%。从重复中,我们估计删除的错误发现率控制在~10%(超过五个探针时,错误发现率降至3%以下),重复的错误发现率控制在~28%。与使用 PennCNV 调用相比,该流程显示出更高的灵敏度,特别是对于仅覆盖少数探针的短删除。对于每个被称为 CNV 的 CNV,平均z分数是控制错误发现率的有用指标。
更新日期:2020-10-05
down
wechat
bug