当前位置: X-MOL 学术Genome Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Polishing copy number variant calls on exome sequencing data via deep learning
Genome Research ( IF 6.2 ) Pub Date : 2022-06-01 , DOI: 10.1101/gr.274845.120
Furkan Özden 1 , Can Alkan 1 , A Ercüment Çiçek 1, 2
Affiliation  

Accurate and efficient detection of copy number variants (CNVs) is of critical importance owing to their significant association with complex genetic diseases. Although algorithms that use whole-genome sequencing (WGS) data provide stable results with mostly valid statistical assumptions, copy number detection on whole-exome sequencing (WES) data shows comparatively lower accuracy. This is unfortunate as WES data are cost-efficient, compact, and relatively ubiquitous. The bottleneck is primarily due to the noncontiguous nature of the targeted capture: biases in targeted genomic hybridization, GC content, targeting probes, and sample batching during sequencing. Here, we present a novel deep learning model, DECoNT, which uses the matched WES and WGS data, and learns to correct the copy number variations reported by any off-the-shelf WES-based germline CNV caller. We train DECoNT on the 1000 Genomes Project data, and we show that we can efficiently triple the duplication call precision and double the deletion call precision of the state-of-the-art algorithms. We also show that our model consistently improves the performance independent of (1) sequencing technology, (2) exome capture kit, and (3) CNV caller. Using DECoNT as a universal exome CNV call polisher has the potential to improve the reliability of germline CNV detection on WES data sets.

中文翻译:


通过深度学习完善外显子组测序数据的拷贝数变异调用



准确有效地检测拷贝数变异 (CNV) 至关重要,因为它们与复杂的遗传疾病密切相关。尽管使用全基因组测序 (WGS) 数据的算法可以在大多数有效的统计假设下提供稳定的结果,但全外显子组测序 (WES) 数据的拷贝数检测的准确性相对较低。这是不幸的,因为 WES​​ 数据具有成本效益、紧凑且相对普遍。瓶颈主要是由于靶向捕获的非连续性:靶向基因组杂交、GC 含量、靶向探针和测序过程中的样品批次中的偏差。在这里,我们提出了一种新颖的深度学习模型DECoNT ,它使用匹配的 WES 和 WGS 数据,并学习纠正任何现成的基于 WES 的种系 CNV 调用程序报告的拷贝数变异。我们在千人基因组计划数据上训练 DECoNT,结果表明,我们可以有效地将最先进算法的重复调用精度提高三倍,并将删除调用精度提高一倍。我们还表明,我们的模型始终如一地提高性能,独立于 (1) 测序技术、(2) 外显子组捕获试剂盒和 (3) CNV 调用程序。使用 DECoNT 作为通用外显子组 CNV 修饰器有可能提高 WES 数据集种系 CNV 检测的可靠性。
更新日期:2022-06-01
down
wechat
bug