Estimating linkage disequilibrium from genotypes under Hardy-Weinberg equilibrium.,BMC Genetics

当前位置： X-MOL 学术 › BMC Genet. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Estimating linkage disequilibrium from genotypes under Hardy-Weinberg equilibrium.
BMC Genetics Pub Date : 2020-02-26 , DOI: 10.1186/s12863-020-0818-9
Tin-Yu J Hui ₁ , Austin Burt ₁

Affiliation

BACKGROUND Measures of linkage disequilibrium (LD) play a key role in a wide range of applications from disease association to demographic history estimation. The true population LD cannot be measured directly and instead can only be inferred from genetic samples, which are unavoidably subject to measurement error. Previous studies of r2 (a measure of LD), such as the bias due to finite sample size and its variance, were based on the special case that the true population-wise LD is zero. These results generally do not hold for non-zero [Formula: see text] values, which are more common in real genetic data. RESULTS This work generalises the estimation of r2 to all levels of LD, and for both phased and unphased data. First, we provide new formulae for the effect of finite sample size on the observed r2 values. Second, we find a new empirical formula for the variance of the observed r2, equals to 2E[r2](1 - E[r2])/n, where n is the diploid sample size. Third, we propose a new routine, Constrained ML, a likelihood-based method to directly estimate haplotype frequencies and r2 from diploid genotypes under Hardy-Weinberg Equilibrium. While serving the same purpose as the pre-existing Expectation-Maximisation algorithm, the new routine can have better convergence and is simpler to use. A new likelihood-ratio test is also introduced to test for the absence of a particular haplotype. Extensive simulations are run to support these findings. CONCLUSION Most inferences on LD will benefit from our new findings, from point and interval estimation to hypothesis testing. Genetic analyses utilising r2 information will become more accurate as a result.

中文翻译：

从Hardy-Weinberg平衡下的基因型估计连锁不平衡。

背景技术连锁不平衡（LD）的措施在从疾病关联到人口历史估计的广泛应用中起着关键作用。真实种群LD无法直接测量，只能从无法避免测量误差的遗传样本中推断出来。先前对r2（衡量LD的方法）的研究，例如由于样本数量有限及其方差引起的偏差，是基于特殊的情况，即按人口实际计算的LD为零。这些结果通常不适用于非零[公式：参见文本]值，这在实际遗传数据中更为常见。结果这项工作将r2的估计推广到LD的所有级别，以及有相位和无相位数据。首先，我们为有限样本量对观察到的r2值的影响提供了新的公式。第二，我们为观察到的r2的方差找到了一个新的经验公式，等于2E [r2]（1- E [r2]）/ n，其中n是二倍体样本大小。第三，我们提出了一种新的例程Constrained ML，一种基于似然的方法，可以根据Hardy-Weinberg平衡直接从二倍体基因型估算单倍型频率和r2。在实现与预先存在的期望最大化算法相同的目的的同时，新例程可以具有更好的收敛性并且更易于使用。还引入了新的似然比检验来测试特定单倍型的缺失。运行广泛的模拟以支持这些发现。结论大多数关于LD的推论将从我们的新发现中受益，从点和区间估计到假设检验。结果，利用r2信息进行的遗传分析将变得更加准确。

更新日期：2020-04-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11