Abstract
The human leukocyte antigen (HLA) is the most polymorphic region in humans. Anthropologists use HLA to trace populations’ migration and evolution. However, recent admixture between populations can mask the ancestral haplotype frequency distribution. We present a statistical method based on high-resolution HLA haplotype frequencies to resolve population admixture using a non-negative matrix factorization formalism and validated using haplotype frequencies from 56 world populations. The result is a minimal set of source components (SCs) decoding roughly 90% of the total variance in the studied admixtures. These SCs agree with the geographical distribution, phylogenies, and recent admixture events of the studied groups. With the growing population of multi-ethnic individuals, or individuals that do not report race/ethnic information, the HLA matching process for stem-cell and solid organ transplants is becoming more challenging. The presented algorithm provides a framework that facilitates the breakdown of highly admixed populations into SCs, which can be used to better match the rapidly growing population of multi-ethnic individuals worldwide.
Similar content being viewed by others
Data availability
The authors affirm that all data necessary for confirming the conclusions of the article are present within the article, figures, and tables or in published datasets.
References
Abi-Rached L, Jobin MJ, Kulkarni S, McWhinnie A, Dalva K, Gragert L, Babrzadeh F, Gharizadeh B, Luo M, Plummer FA (2011) The shaping of modern human immune systems by multiregional admixture with archaic humans. Science 334(6052):89–94
Aguilar A, Roemer G, Debenham S, Binns M, Garcelon D, Wayne RK (2004) High MHC diversity maintained by balancing selection in an otherwise genetically monomorphic mammal. Proc Natl Acad Sci U S A 101(10):3490–3494
Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19(9):1655–1664
Alter I, Gragert L, Fingerson S, Maiers M, Louzoun Y (2017) HLA class I haplotype diversity is consistent with selection for frequent existing haplotypes. PLoS Comput Biol 13(8):e1005693
Apanius V, Penn D, Slev PR, Ruff LR, Potts WK (1997) The nature of selection on the major histocompatibility complex. Crit Rev Immunol 17(2):179–224
Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7(10):781–791
Behar DM, Yunusbayev B, Metspalu M, Metspalu E, Rosset S, Parik J, Rootsi S, Chaubey G, Kutuev I, Yudkovsky G (2010) The genome-wide structure of the Jewish people. Nature 466(7303):238–242
Bird CE, Karl SA, Smouse PE, Toonen RJ (2011) Detecting and measuring genetic differentiation. Phylogeography and population genetics in Crustacea 19:31–55
Brand A, Doxiadis I, Roelen D (2013) On the role of HLA antibodies in hematopoietic stem cell transplantation. HLA 81(1):1–11
Bryc K, Durand EY, Macpherson JM, Reich D, Mountain JL (2015) The genetic ancestry of African Americans, Latinos, and European Americans across the United States. Am J Hum Genet 96(1):37–53
Chinen J, Buckley RH (2010) Transplantation immunology: solid organ and bone marrow. J Allergy Clin Immunol 125(2):S324–S335
Chua EW, Kennedy MA (2012) Current state and future prospects of direct-to-consumer pharmacogenetics. Front Pharmacol 3:152
Consortium GP (2015) A global reference for human genetic variation. Nature 526(7571):68
Costa CL,Schneider DM, Ramos MF, de Aguiar MA (2017) “Constructing phylogenetic trees in individual based models.” arXiv preprint arXiv:1709.04416
Excoffier L, Laval G, Schneider S (2005) Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinformatics Online 1:47
Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131(2):479–491
Ferrell PB, McLeod HL (2008). “Carbamazepine, HLA-B* 1502 and risk of Stevens–Johnson syndrome and toxic epidermal necrolysis: US FDA recommendations.”
Fujimura JH, Rajagopalan R (2010) Different differences: the use of ‘genetic ancestry’ versus race in biomedical human genetic research. Soc Stud Sci. https://doi.org/10.1177/0306312710379170
Gaujoux R, Seoighe C (2010) A flexible R package for nonnegative matrix factorization. BMC bioinformatics 11(1):367
Geneugelijk K, Wissing J, Koppenaal D, Niemann M, Spierings E (2017) Computational approaches to facilitate epitope-based HLA matching in solid organ transplantation. J Immunol Res
Gragert L, Eapen M, Williams E, Freeman J, Spellman S, Baitty R, Hartzman R, Rizzo JD, Horowitz M, Confer D (2014a) HLA match likelihoods for hematopoietic stem-cell grafts in the US registry. N Engl J Med 371(4):339–348
Gragert L, Fingerson S, Albrecht M, Maiers M, Kalaycio M, Hill BT (2014b) Fine-mapping of HLA ASSOCIATIONS with chronic lymphocytic leukemia in US populations. Blood 124(17):2657–2665
Gragert L, Madbouly A, Freeman J, Maiers M (2013) Six-locus high resolution HLA haplotype frequencies derived from mixed-resolution DNA typing for the entire US donor registry. Hum Immunol 74(10):1313–1320
Hammer MF, Karafet T, Rasanayagam A, Wood ET, Altheide TK, Jenkins T, Griffiths RC, Templeton AR, Zegura SL (1998) Out of Africa and back again: nested cladistic analysis of human Y chromosome variation. Mol Biol Evol 15(4):427–441
Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106(23):9362–9367
Hollenbach JA, Saperstein A, Albrecht M, Vierra-Green C, Parham P, Norman PJ, Maiers M (2015) Race, ethnicity and ancestry in unrelated transplant matching for the National Marrow Donor Program: a comparison of multiple forms of self-identification with genetics. PloS one 10(8):e0135960
Holoshitz J (2013) The quest for better understanding of HLA-disease association: scenes from a road less travelled by. Discov Med 16(87):93–101
Kaeuffer R, Réale D, Coltman D, Pontier D (2007) Detecting population structure using STRUCTURE software: effect of background linkage disequilibrium. Heredity 99(4):374
Kennedy GC, Matsuzaki H, Dong S, Liu W-m, Huang J, Liu G, Su X, Cao M, Chen W, Zhang J (2003) Large-scale genotyping of complex DNA. Nat Biotechnol 21(10):1233–1237
Klitz W, Gragert L, Maiers M, Fernandez-Viña M, Ben-Naeh Y, Benedek G, Brautbar C, Israel S (2010) Genetic differentiation of Jewish populations. Tissue Antigens 76(6):442–458
Kollman C, Maiers M, Gragert L, Müller C, Setterholm M, Oudshoorn M, Hurley CK (2007) Estimation of HLA-A,-B,-DRB1 haplotype frequencies using mixed resolution data from a national registry with selective retyping of volunteers. Hum Immunol 68(12):950–958
Lam T, Shen M, Chia J, Chan S, Ren E (2013) Population-specific recombination sites within the human MHC region. Heredity 111(2):131
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. Adv Neural Inf Proces Syst
Lobkovsky AE,Levi L,Wolf YI, Maiers M, Gragert L, Alter I, Louzoun Y, Koonin EV (2019) “Multiplicative fitness, rapid haplotype discovery, and fitness decay explain evolution of human MHC”. Proceedings of the National Academy of Sciences: 201714436
Madbouly A, Gragert L, Freeman J, Leahy N, Gourraud PA, Hollenbach JA, Kamoun M, Fernandez‐Vina M, Maiers M (2014) Validation of statistical imputation of allele‐level multilocus phased genotypes from ambiguous HLA assignments. Tissue antigens 84(3):285–92
Maiers M, Halagan M, Joshi S, Ballal HS, Jagannatthan L, Damodar S, Srinivasan P, Narayan S, Khattry N, Malhotra P (2014) HLA match likelihoods for Indian patients seeking unrelated donor transplantation grafts: a population-based study. The Lancet Haematol 1(2):e57–e63
Malaspinas A-S, Westaway MC, Muller C, Sousa VC, Lao O, Alves I, Bergström A, Athanasiadis G, Cheng JY, Crawford JE (2016) A genomic history of Aboriginal Australia. Nature 538(7624):207–214
Manor S, Halagan M, Shriki N, Yaniv I, Zisser B, Maiers M, Madbouly A, Stein J (2016) High-resolution HLA A ∼ B ∼ DRB1 haplotype frequencies from the Ezer Mizion Bone Marrow Donor Registry in Israel. Hum Immunol 77(12):1114–1119
McEvoy BP, Lind JM, Wang ET, Moyzis RK, Visscher PM, van Holst Pellekaan SM, Wilton AN (2010) Whole-genome genetic diversity in a sample of Australians with deep Aboriginal ancestry. Am J Hum Genet 87(2):297–305
Moorjani P, Patterson N, Hirschhorn JN, Keinan A, Hao L, Atzmon G, Burns E, Ostrer H, Price AL, Reich D (2011) The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genet 7(4):e1001373
Nunnally JC, Bernstein I (1994) Psychometric Theory (McGraw-Hill Series in Psychology). McGraw-Hill, New York
Parham P, Ohta T (1996) Population biology of antigen presentation by MHC class I molecules. Science 272(5258):67–74
Paschou P, Drineas P, Lewis J, Nievergelt CM, Nickerson DA, Smith JD, Ridker PM, Chasman DI, Krauss RM, Ziv E (2008) Tracing sub-structure in the European American population with PCA-informative markers. PLoS Genet 4(7):e1000114
Phillips BL, Callaghan C (2017) The immunology of organ transplantation. Surgery (Oxford)
Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, Absher D, Srinivasan BS, Barsh GS, Myers RM, Feldman MW, Pritchard JK (2009) Signals of recent positive selection in a worldwide sample of human populations. Genome Res 19(5):826–837
Porras-Hurtado L, Ruiz Y, Santos C, Phillips C, Carracedo Á, Lareu MV (2013) An overview of STRUCTURE: applications, parameter settings, and supporting software. Front Genet 4
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2):945–959
Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SG (2014) The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res 43(D1):D423–D431
Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, Schaffner SF, Lander ES, C. International HapMap, Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallee C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe’er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Johnson TA, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L’Archeveque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J (2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449(7164):913–918
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425
Sanchez-Mazas A, Thorsby E (2012) HLA in anthropology: the enigma of Easter Island. Clin Transpl:167–173
Shiina T, Hosomichi K, Inoko H, Kulski JK (2009) The HLA genomic loci map: expression, interaction, diversity and disease. J Hum Genet 54(1):15–39
Simmonds M, Gough S (2007) The HLA region and autoimmune disease: associations and mechanisms of action. Curr Genomics 8(7):453–465
Single RM, Meyer D, Hollenbach JA, Nelson MP, Noble JA, Erlich HA, Thomson G (2002) Haplotype frequency estimation in patient populations: the effect of departures from Hardy-Weinberg proportions and collapsing over a locus in the HLA region. Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society 22(2):186–195
Sjakste T, Kalnina J, Paramonova N, Nikitina-Zake L, Sjakste N (2016) “Journal of Molecular and Genetic Medicine.”
Slater N, Louzoun Y, Gragert L, Maiers M, Chatterjee A, Albrecht M (2015) Power laws for heavy-tailed distributions: modeling allele and haplotype diversity for the national marrow donor program. PLoS Comput Biol 11(4):e1004204
Tokunaga K,Imanishi T, Takahashi K, Juji T (1996) “On the origin and dispersal of East Asian populations as viewed from HLA haplotypes”. Prehistoric mongoloid dispersals: 187-197.
Traherne J (2008) Human MHC architecture and evolution: implications for disease association studies. Int J Immunogenet 35(3):179–192
Verdu P, Pemberton TJ, Laurent R, Kemp BM, Gonzalez-Oliver A, Gorodezky C, Hughes CE, Shattuck MR, Petzelt B, Mitchell J (2014) Patterns of admixture and population structure in native populations of Northwest North America. PLoS Genet 10(8):e1004530
Voight BF, Kudaravalli S, Wen X, Pritchard JK (2006) A map of recent positive selection in the human genome. PLoS Biol 4(3):e72
Wang S, Lewis CM Jr, Jakobsson M, Ramachandran S, Ray N, Bedoya G, Rojas W, Parra MV, Molina JA, Gallo C (2007) Genetic variation and population structure in Native Americans. PLoS Genet 3(11):e185
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. evolution:1358–1370
Weir BS, Hill WG (2002) Estimating F-statistics. Annu Rev Genet 36(1):721–750
Weir BS, Ott J (1997) Genetic data analysis II. Trends Genet 13(9):379
Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, Nielsen R (2007) Localizing recent adaptive evolution in the human genome. PLoS Genet 3(6):e90
Zhou H, Alexander D, Lange K (2011) A quasi-Newton acceleration for high-dimensional optimization algorithms. Stat Comput 21(2):261–273
Acknowledgment
We would like to thank the following registries for allowing their data to be used for this study: National Marrow Donor Program/Be The Match, USA; Ezer Mizion Bone Marrow Donor Registry, Israel; OneMatch Stem Cell and Marrow Network, Canada; Australian Bone Marrow Donor Registry, Australia; Matchis: the Dutch Centre for Stem Cell Donors, The Netherlands; Norwegian Bone Marrow Donor Registry, Norway; New Zealand Bone Marrow Donor Registry, New Zealand; Tobias Registry of Swedish Bone Marrow Donors, Sweden; Thai National Stem Cell Donor Registry, Thailand; Welsh Bone Marrow Donor Registry, Wales, UK.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author Summary
Anthropologists frequently use HLA to trace migration and evolution of different populations. This is due to the high linkage among HLA genes leading to the transmission of intact haplotypes from parents to offspring, hence preserving key population ancestral features. Such a linkage has been proposed to be the result of positive selection. However, over the last century, human kind is going through a rapid population mixture, producing admixtures of original populations, and algorithms are required to disentangle these populations for population genetics and for improving matching for stem-cell or solid organ transplants.
To address these challenges, we developed a new HLA-based method to identify population-level admixture using high-resolution HLA haplotype frequencies. This is in contrast with existing methods estimating individual-level admixture based on genomic distribution of SNPs.
We show that 90% of the genetic variance of the HLA haplotypes can be explained using a much-reduced set of 8 source components (SCs). The estimated SCs and admixture models agree with the geographical distribution, population phylogenies, and recent historic admixture events of the studied populations.
Electronic supplementary material
ESM 1
(DOCX 5238 kb)
Rights and permissions
About this article
Cite this article
Simanovsky, A.L., Madbouly, A., Halagan, M. et al. Single haplotype admixture models using large scale HLA genotype frequencies to reproduce human admixture. Immunogenetics 71, 589–604 (2019). https://doi.org/10.1007/s00251-019-01144-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00251-019-01144-7