HLA haplotype frequency estimation for heterogeneous populations using a graph-based imputation algorithm,Human Immunology

当前位置： X-MOL 学术 › Hum. Immunol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

HLA haplotype frequency estimation for heterogeneous populations using a graph-based imputation algorithm
Human Immunology ( IF 2.7 ) Pub Date : 2021-07-26 , DOI: 10.1016/j.humimm.2021.07.001
Sapir Israeli ₁ , Loren Gragert ₂ , Martin Maiers ₃ , Yoram Louzoun ₄

Affiliation

HLA haplotype frequencies are estimated from ambiguous unphased HLA genotyping data using Expectation-Maximization (EM) algorithms. Current population genetics methods require independent EM frequency estimates for each population, and assume that each population is in Hardy-Weinberg Equilibrium (HWE). The HWE assumption of EM has thus far resulted in the exclusion of individuals from mixed or unknown ethnic backgrounds from reference datasets. Multi-region populations are currently poorly served by stem cell donor registry HLA imputation and matching implementations due to the inability of such algorithms to incorporate admixture into their population genetics models. To address this unmet need, we have expanded the imputation component of our GRaph IMputation and Matching (GRIMM) framework, where imputation becomes the expectation step in an iterative EM algorithm. Our novel multi-region EM implementation considers region as a Bayesian prior, enabling integration of HLA information from multiple single-region population groups, and for the first time including individuals with ambiguous or mixed ethnic backgrounds. We show that our multi-region EM produces much higher likelihood values and better haplotype recovery as measured by Kullback-Leibler divergence than all evaluated EM implementations when tested on real datasets of US donor registry HLA typings as well as simulated multi-region datasets of ambiguous HLA typings.

中文翻译：

使用基于图的插补算法对异质群体进行 HLA 单倍型频率估计

HLA 单倍型频率是使用期望最大化 (EM) 算法从不明确的未定相 HLA 基因分型数据估计的。当前的群体遗传学方法需要对每个群体进行独立的 EM 频率估计，并假设每个群体都处于 Hardy-Weinberg 平衡 (HWE)。迄今为止，EM 的 HWE 假设导致将来自混合或未知种族背景的个人排除在参考数据集中。由于此类算法无法将混合物纳入其群体遗传学模型，因此干细胞供体注册表 HLA 插补和匹配实施目前无法为多区域群体提供服务。为了解决这个未满足的需求，我们扩展了我们的 Graph IMputation and Matching (GRIMM) 框架的插补组件，其中插补成为迭代 EM 算法中的期望步骤。我们新颖的多区域 EM 实施将区域视为贝叶斯先验，从而能够整合来自多个单区域人口群体的 HLA 信息，并且首次包括具有模糊或混合种族背景的个体。我们表明，当在美国捐赠者登记处 HLA 分型的真实数据集以及模棱两可的模拟多区域数据集上进行测试时，我们的多区域 EM 产生了比所有评估的 EM 实现更高的似然值和更好的单倍型恢复（通过 Kullback-Leibler 散度测量） HLA 分型。并且首次包括具有模糊或混合种族背景的个人。我们表明，当在美国捐赠者登记处 HLA 分型的真实数据集以及模棱两可的模拟多区域数据集上进行测试时，我们的多区域 EM 产生了比所有评估的 EM 实现更高的似然值和更好的单倍型恢复（通过 Kullback-Leibler 散度测量） HLA 分型。并且首次包括具有模糊或混合种族背景的个人。我们表明，当在美国捐赠者登记处 HLA 分型的真实数据集以及模棱两可的模拟多区域数据集上进行测试时，我们的多区域 EM 产生了比所有评估的 EM 实现更高的似然值和更好的单倍型恢复（通过 Kullback-Leibler 散度测量） HLA 分型。

更新日期：2021-09-17

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>