当前位置: X-MOL 学术Am. J. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Genotype error biases trio-based estimates of haplotype phase accuracy
American Journal of Human Genetics ( IF 9.8 ) Pub Date : 2022-06-02 , DOI: 10.1016/j.ajhg.2022.04.019
Brian L Browning 1 , Sharon R Browning 2
Affiliation  

Haplotypes can be estimated from unphased genotype data via statistical methods. When parent-offspring trios are available for inferring the true phase from Mendelian inheritance rules, the accuracy of statistical phasing is usually measured by the switch error rate, which is the proportion of pairs of consecutive heterozygotes that are incorrectly phased. We present a method for estimating the genotype error rate from parent-offspring trios and a method for estimating the bias that occurs in the observed switch error rate as a result of genotype error. We apply these methods to 485,301 genotyped UK Biobank samples that include 898 White British trios and to 38,387 sequenced TOPMed samples that include 217 African Caribbean trios and 669 European American trios. We show that genotype error inflates the observed switch error rate and that the relative bias increases with sample size. For the UK Biobank White British trios, the observed switch error rate in the trio offspring is 2.4 times larger than the estimated true switch error rate (1.4 × 10−3 vs 5.8 × 10−4. We propose an alternate definition of phase error that counts two consecutive switch errors as a single error because back-to-back switch errors arise when a single heterozygote is incorrectly phased with respect to the surrounding heterozygotes. With this definition, we estimate that the average distance between phase errors is 64 megabases in the UK Biobank White British individuals.



中文翻译:

基因型误差使基于三重奏的单倍型相位准确性估计产生偏差

单倍型可以通过统计方法根据非定相基因型数据来估计。当亲子三重奏可用于根据孟德尔遗传规则推断真实阶段时,统计定相的准确性通常通过转换错误率来衡量,即错误定相的连续杂合子对的比例。我们提出了一种估计亲子三重奏的基因型错误率的方法,以及一种估计由于基因型错误而观察到的切换错误率中出现的偏差的方法。我们将这些方法应用于 485,301 个基因型英国生物库样本(其中包括 898 个英国白人三人组)和 38,387 个经过测序的 TOPMed 样本(其中包括 217 个非洲加勒比三人组和 669 个欧洲裔美国人三人组)。我们表明,基因型误差会夸大观察到的转换误差率,并且相对偏差随着样本量的增加而增加。对于英国生物银行白人英国三人组,观察到的三人后代的开关错误率比估计的真实开关错误率大 2.4 倍(1.4 × 10 -3与 5.8 × 10 -4 )。我们提出了相位误差的另一种定义:将两个连续的转换错误计为单个错误,因为当单个杂合子相对于周围杂合子的定相错误时,就会出现背靠背的转换错误。根据这个定义,我们估计相位错误之间的平均距离为 64 兆碱基英国生物银行英国白人个体。

更新日期:2022-06-03
down
wechat
bug