当前位置: X-MOL 学术Eur. J. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Accurate fetal variant calling in the presence of maternal cell contamination.
European Journal of Human Genetics ( IF 3.7 ) Pub Date : 2020-07-29 , DOI: 10.1038/s41431-020-0697-6
Elena Nabieva 1, 2 , Satyarth Mishra Sharma 1 , Yermek Kapushev 1, 2 , Sofya K Garushyants 1, 2 , Anna V Fedotova 1, 3 , Viktoria N Moskalenko 3 , Tatyana E Serebrenikova 4 , Eugene Glazyrina 4 , Ilya V Kanivets 5 , Denis V Pyankov 5 , Tatyana V Neretina 2, 3 , Maria D Logacheva 1, 2, 3 , Georgii A Bazykin 1, 2 , Dmitry Yarotsky 1, 2
Affiliation  

High-throughput sequencing of fetal DNA is a promising and increasingly common method for the discovery of all (or all coding) genetic variants in the fetus, either as part of prenatal screening or diagnosis, or for genetic diagnosis of spontaneous abortions. In many cases, the fetal DNA (from chorionic villi, amniotic fluid, or abortive tissue) can be contaminated with maternal cells, resulting in the mixture of fetal and maternal DNA. This maternal cell contamination (MCC) undermines the assumption, made by traditional variant callers, that each allele in a heterozygous site is covered, on average, by 50% of the reads, and therefore can lead to erroneous genotype calls. We present a panel of methods for reducing the genotyping error in the presence of MCC. All methods start with the output of GATK HaplotypeCaller on the sequencing data for the (contaminated) fetal sample and both of its parents, and additionally rely on information about the MCC fraction (which itself is readily estimated from the high-throughput sequencing data). The first of these methods uses a Bayesian probabilistic model to correct the fetal genotype calls produced by MCC-unaware HaplotypeCaller. The other two methods “learn” the genotype-correction model from examples. We use simulated contaminated fetal data to train and test the models. Using the test sets, we show that all three methods lead to substantially improved accuracy when compared with the original MCC-unaware HaplotypeCaller calls. We then apply the best-performing method to three chorionic villus samples from spontaneously terminated pregnancies.



中文翻译:


在存在母体细胞污染的情况下准确的胎儿变异检出。



胎儿 DNA 的高通量测序是一种有前途且越来越常见的方法,用于发现胎儿中的所有(或所有编码)遗传变异,无论是作为产前筛查或诊断的一部分,还是用于自然流产的遗传诊断。在许多情况下,胎儿 DNA(来自绒毛膜绒毛、羊水或流产组织)可能会被母体细胞污染,导致胎儿和母体 DNA 混合。这种母体细胞污染 (MCC) 破坏了传统变异识别者做出的假设,即杂合位点中的每个等位基因平均被 50% 的读数覆盖,因此可能导致错误的基因型识别。我们提出了一系列方法来减少 MCC 存在下的基因分型错误。所有方法都从 GATK HaplotypeCaller 对(受污染的)胎儿样本及其父母的测序数据的输出开始,另外还依赖于有关 MCC 分数的信息(其本身很容易从高通量测序数据中估计出来)。第一种方法使用贝叶斯概率模型来纠正由 MCC 未知的 HaplotypeCaller 产生的胎儿基因型调用。其他两种方法从示例中“学习”基因型校正模型。我们使用模拟的受污染胎儿数据来训练和测试模型。使用测试集,我们表明,与原始的 MCC-unaware HaplotypeCaller 调用相比,所有三种方法都能显着提高准确性。然后,我们将性能最佳的方法应用于自然终止妊娠的三个绒毛膜绒毛样本。

更新日期:2020-07-29
down
wechat
bug