Main

For almost all human diseases, individual susceptibility is, to some degree, influenced by genetic variation. Consequently, characterizing the relationship between sequence variation and disease predisposition provides a powerful tool for identifying processes fundamental to disease pathogenesis and highlighting novel strategies for prevention and treatment.

Over the past 25 years, advances in technology and analytical approaches, often building on major community projects—such as those that generated the human genome sequence1 and elaborated on that reference to capture sites of genetic variation2,3,4,5,6—have enabled many of the genes and variants that are causal for rare diseases to be identified and enabled a systematic dissection of the genetic basis of common multifactorial traits. There is growing momentum behind the application of this knowledge to drive innovation in clinical care, most obviously through developments in precision medicine. Genomic medicine, which was previously restricted to a few specific clinical indications, is poised to go mainstream.

This Review charts recent milestones in the history of human disease genetics and provides an opportunity to reflect on lessons learned by the human genetics community. We focus first on the long-standing division between genetic discovery efforts targeting rare variants with large effects and those seeking alleles that influence predisposition to common diseases. We describe how this division, with its echoes of the century-old debate between Mendelian and biometric views of human genetics, has obscured the continuous spectrum of disease risk alleles—across the range of frequencies and effect sizes—observed in the population, and outline how genome-wide analyses in large biobanks are transforming genetic research by enabling a comprehensive perspective on genotype–phenotype relationships. We describe how the expansion in the scale and scope of strategies for enumerating the functional consequences of genetic variation is transforming the torrent of genetic discoveries from the past decade into mechanistic insights, and the ways in which this knowledge increasingly underpins advances in clinical care. Finally, we reflect on some of the challenges and opportunities that confront the field, and the principles that will, over the coming decade, drive the application of human genetics to enhance understanding of health and disease and maximize clinical benefit.

Rare diseases, rare variants

During the 1980s and 1990s, efforts to map disease genes were focused on rare, monogenic and syndromic diseases and were mostly driven by linkage analysis and fine mapping within large multiplex pedigrees. Localization of genetic signals was typically followed by Sanger sequencing of the genes found to map within the linked locus to identify disease-causing alleles. Assessments of pathogenicity, based on segregation of a putatively causal variant with disease across multiple families and evidence that the risk genotype was absent in healthy individuals, were typically followed by confirmatory functional studies in cellular and animal models. This path to gene identification was laborious; nevertheless, by 2000, around 1,000 of the estimated 7,000 single-gene inherited diseases had been characterized, including many with substantial biomedical impact, such as Huntington’s disease and cystic fibrosis7,8,9.

Completion of the draft human genome sequence1 reduced many of the obstacles to disease-gene mapping and propelled a fourfold increase in the genes implicated as causal for rare, single-gene disorders (Fig. 1). Microarray-based detection of structural variation10 and exome- and genome-wide sequencing11,12 have been pivotal, bolstered by in silico analysis and prioritization of the discovered genetic variants. Increasing availability of reference datasets cataloguing population genetic variation across diverse ethnic backgrounds has supported robust causal inference2,3,5,6. More recently, the adoption of high-throughput sequencing technologies has enabled the full range of causal genetic variation, from single mutations to large structural rearrangements, to be identified in a single assay. These technologies have extended from research into clinical usage, driving earlier and faster diagnosis for genetic disorders.

Fig. 1: Growth in the discovery of disease-associated genetic variation.
figure 1

The cumulative numbers of genes harbouring variants causal for rare, monogenic diseases and traits and of significant GWAS associations implicated in common, complex diseases and traits are shown. Left, the advent of high-throughput sequencing technologies and availability of reference genomes from diverse populations has supported a fourfold increase in the discovery of rare disease-causing genes between 1999 and 2019. Right, international efforts such as the Human Genome Project and the HapMap Project, combined with GWAS and sequencing studies, have supported identification of more than 60,000 genetic associations across thousands of human diseases and traits. Centre, more recent developments have brought a synthesis of the rare- and common-variant approaches based around the combination of sequence-informed analyses in large cohorts. Key events contributing to these themes are depicted in the timeline. GA4GH, Global Alliance for Genomics and Health160; ExAC, Exome Aggregation Consortium5.

Reduced reliance on multiplex pedigrees in favour of collections of affected cases, often with parents13, has proven decisive in identifying new dominant disorders, many of which were previously considered recessive14. Increasingly, discovery of rare disease genes has transitioned from genetic characterization of small numbers of individuals with similar clinical presentations to genome-wide sequencing of larger cohorts of phenotypically diverse patients. This genotype-driven approach has revealed new disorders associated with more variable clinical presentation15,16.

A more systematic approach to data sharing has been critical, both for the characterization of new disorders and diagnostic interpretation of potential causal alleles. The value of sharing genetic and phenotypic data from those thought to harbour rare undiagnosed genetic diseases has fostered global collaborative networks (for example, Matchmaker Exchange, DECIPHER and GeneMatcher) designed to match patients with similar genetic variants and/or phenotypic manifestations, even across continents17,18,19. Interactions between researchers and families with rare disease have enabled natural history studies to be driven by family support groups positioned to initiate data collection from patient cohorts once a causal gene is discovered20.

Clinical translation of these technologies has benefited from a series of information resources, including open databases of genes associated with rare disorders (for example, OMIM and ORPHANET)21, clinically interpreted variants (for example, ClinVar and ClinGen)22,23 and patient records (for example, DECIPHER and MyGene2 (https://mygene2.org/MyGene2))17. Access to resources that catalogue genetic variation across populations (such as ExAC and its successor gnomAD)5,6 has enabled the confident exclusion of genetic variants too common in population-level data to be plausible causes of rare, penetrant early-onset genetic diseases24. These analyses have reduced the contamination of databases with variants erroneously interpreted as causal for disease, and are addressing the overestimation of disease penetrance arising from the historical focus on multiplex pedigrees25. Improved recognition of the variable penetrance of many ‘monogenic’ disease alleles has invigorated efforts to identify the genetic and environmental modifiers responsible26,27.

Although huge strides have been made in associating specific genes with particular disorders, establishing the causal role of individual variants within those genes remains problematic, and many patients with suspected rare genetic diseases are left without a definitive diagnosis28. Even for variants with established causality, the penetrance is often unclear. Resolving these uncertainties represents the central challenge for the field. Aggregation of sequencing data from large numbers of affected cases and population reference samples will provide the evidence base required for robust interpretation of variants. Highly parallelized in vitro cellular assays that allow assessment of the functional effects of all variants in a disease-associated gene can transform interpretation of novel variants29, although developing well-calibrated functional assays predictive of pathogenicity for all disease genes represents a daunting prospect. Direct functional genomic exploration of accessible and disease-relevant tissues from patients using RNA sequencing and DNA methylation assays30,31 can identify previously cryptic causal genetic variants, particularly in under-explored regions outside protein-coding genes32,33. Developments in each of these areas will extend the range of variants and genes for which diagnostic and prognostic clinical information can be provided to patients and their families.

Common diseases, common variants

Efforts to apply the approach—linkage analysis in multiplex pedigrees—that had been so successful for the high-penetrance variants responsible for Mendelian disease were, with notable exceptions34,35,36, largely unsuccessful for common, later-onset traits with more complex multifactorial aetiologies, such as asthma, diabetes and depression. Recognition that association-based methods, focused on detecting phenotype-related differences in variant allele frequencies might have greater traction for identifying less penetrant common alleles redirected attention to analysis of case–control samples37. However, initial efforts targeting variants within ‘candidate’ genes were plagued by inadequate power, unduly liberal thresholds for declaring significance and scant attention to sources of bias and confounding, resulting in overblown claims and failed replication.

Systematic efforts to characterize genome-wide patterns of genomic variation, initially through the HapMap Consortium2, proved catalytic, demonstrating that the allelic structure of the genome was segmented into haplotype blocks, each containing sets of correlated variants. Recognition that this configuration could support genome-wide surveys of association energized the technological innovation—in the form of massively parallel genotyping arrays—to make such studies possible (Fig. 1). Early wins in acute macular degeneration38 and inflammatory bowel disease39 were encouraging, and progress on several fronts—expansion of study size, denser genotyping arrays, novel strategies for imputation, attention to biases and appropriate significance thresholds—delivered robust associations across a range of diseases40. Most variants uncovered by these early genome-wide association studies (GWAS) were common, with more subtle effects than many had anticipated. A host of trait-specific consortia formed, covering diverse dichotomous and quantitative phenotypes, to accelerate genetic discovery through the aggregation and meta-analysis of data from multiple GWAS studies41,42,43. Many tens of thousands of robust associations were identified44. Recently, increased access to exome and whole-genome sequence data has, through both direct association analysis45,46 and imputation3,4, extended discovery to low-frequency and rare alleles previously inaccessible to GWAS.

In the decade since the first GWAS, understanding of the genetic basis of common human disease has been transformed. The disparity between the observed effects of the variants first identified by GWAS and estimates of overall trait heritability (the ‘missing heritability’ conundrum) is now largely resolved47. Common diseases are not simply aggregations of related Mendelian conditions: for most complex traits, genetic predisposition is shared across thousands of mostly common variants with individually modest effects on population risk41,43.

Although the collective contribution of low-frequency and rare risk alleles to overall trait variability appears modest compared with that attributable to common variants45,48, the rare risk alleles detected in current sample sizes necessarily have large phenotypic effects and are proportionately more likely to be coding, enhancing their value for biological inference. Founder populations (such as those from Finland and Iceland) have provided multiple examples of otherwise rare risk alleles driven to higher frequency locally through drift and/or selection49,50,51,52. In addition, studies in populations with high rates of consanguinity make it possible to identify individuals homozygous for otherwise rare loss-of-function alleles, the basis for a ‘human knockout’ project to systematically investigate the phenotypic consequences of gene disruption in humans53,54.

For most diseases, large-scale GWAS-aggregation efforts have been disproportionately powered by information from individuals of European descent55. Whereas patterns of genetic predisposition appear broadly similar across major population groups and many common risk alleles discovered in one population group are detectable in others, allele frequencies can vary substantially; extending GWAS and sequencing studies to diverse populations will surely generate a rich harvest of novel risk alleles.

The relative contributions of common and rare variants indicate that, for many traits, particularly those with post-reproductive onset, purifying selection has had only limited effect45,56. For a few risk alleles, hallmarks of balancing selection reflect increased carrier survival, usually through protection from infectious diseases. This includes well-known examples of alleles maintained at high frequency in populations of African descent57,58.

While the extensive linkage disequilibrium within human populations has been essential to discovery in GWAS, high correlation between adjacent variants frustrates mapping of the specific variants responsible for these associations. Increasing sample size, improved access to trans-ethnic data, and more representative imputation reference panels3 provide a path to improved resolution of the causal variants59 and clues to the molecular mechanisms through which they operate. Functional interpretation is easiest for causal variants within coding sequences; however, most common disease-risk variants map to noncoding sequences, and are presumed to influence predisposition through effects on transcriptional regulation. In these cases, mechanistic inference depends on connecting association signals to their downstream targets (see below). For many traits, there is clear convergence between common-variant association signals and genes implicated in monogenic forms of the same disease, as well as enrichment of GWAS signals in regulatory elements specifically active in cell types consistent with known disease biology60,61. This provides reassurance that, even as the number of association signals for a given disease proliferates, the genetic associations uncovered will coalesce around molecular and cellular processes with a core role in pathogenesis62,63.

Importantly, the signals discovered by GWAS have revealed many unexpected insights into the biological basis of complex disease. Examples include the role of complement in the pathogenesis of acute macular degeneration38, synaptic pruning in schizophrenia64 and autophagy in inflammatory bowel disease65. In addition, as inherited sequence variation is a prominent cause of phenotypic variation (but the reverse is not true), risk variants identified by GWAS have value as genetic instruments, mapping causal relationships between traits and inferring contributions made by circulating biomarkers and environmental exposures to disease development66.

As described below, findings from GWAS have increasing translational impact through identification of novel therapeutic targets67, prioritization (and deprioritization) of existing ones68 and development of polygenic scores that quantify individual genetic risk69.

Comprehensive genotype–phenotype maps

The historical division of disease-gene discovery into monogenic and polygenic strands arose from development and implementation of analytical approaches—family-based linkage and case–control association37—that are best-suited for detecting particular subsets of causal alleles. This obscured the true state of nature, with disease-risk alleles being distributed across a continuous spectrum of frequencies and effect sizes. In addition, the trait- and disease-specific perspective of early GWAS discovery (mostly reliant on case–control studies) was poorly equipped to investigate the contribution of genetic variants to phenotypic effects that are nested within or spread across classical disease definitions. Recent developments have enabled a more holistic perspective on genotype–phenotype relationships (Fig. 1).

One major advance has been the increasing availability of large prospective population-based cohorts. These biobank efforts, pioneered in studies such as the Framingham Cohort70 and the efforts of DeCODE in Iceland71,72, now encompass a growing inventory of national cohorts in North America, Europe, Asia and beyond73,74,75,76. The UK Biobank study, including 500,000 largely healthy, middle-aged participants has been particularly influential, transforming human genetic research in part through permissive data-sharing policies that have allowed multiple research groups to analyse the data74. Efforts to make clinical data embedded in electronic health records and registries available for research77,78 mean that biobanks increasingly provide access to a wide range of demographic, clinical and lifestyle data, captured in harmonized, systematic fashion from large, often multi-ethnic collections of individuals. For millions of biobank participants, this rich phenotypic information has been combined with genome-wide genetic data. There are nascent efforts to capture transcriptomic, proteomic and metabolomic phenotypes, although these are not yet at equivalent scale to the genetic data79,80. Biobank analyses have provided more generalizable estimates of the relevance of genetic risk factors in the context of the separate and joint effects of non-genetic factors81. Increasingly, integration with healthcare data brings a longitudinal dimension to phenotypic characterization, which facilitates analyses of disease progression and lifelong disease risk82.

The rich phenotypic scope of these cohorts has enabled variants of interest to be interrogated for associations across the gamut of available phenotypes. These phenome-wide association studies (PheWAS) have revealed the extent to which many variants have pleiotropic effects across multiple traits83. Some of these relationships are expected, such as the impact of obesity variants on risk of hepatic steatosis and type 2 diabetes84 or variants that influence multiple autoimmune conditions85. Others connect diseases and traits in surprising ways, highlighting shared polygenic, pleiotropic effects and cell-type specificity, and delivering insights into shared biology and overlapping mechanisms86,87. These findings inform the prioritization of therapeutic targets, providing clues to potential on-target side effects and opportunities for drug repurposing87,88,89.

The second enabler of inclusive, systematic analysis of genotype–phenotype relationships has been access to whole-genome sequence data. The scale of genetic analysis based on sequence data still lags behind that of genome-wide genotyping data (the largest sequence-based datasets are one tenth the size of the largest GWAS90,91,92), although reductions in sequencing costs are decreasing the differential. Most direct analysis of high-throughput sequence data has focused on the coding regions. Strategies for assigning variant function and jointly analysing sets of variants of similar functional effect have enabled aggregate, gene-level tests of rare functional-variant association that are often better powered than single-variant tests91,92. However, the principal benefit to date of whole-genome sequence data to genetic discovery has been to bolster array-based access to lower-frequency alleles, either directly, through their inclusion on genotyping platforms, or indirectly, through imputation from sequence-based reference samples3,4.

These developments have enabled researchers to bridge the gap between the monogenic and polygenic realms, identifying common variant modifiers of monogenic phenotypes contributing to the variable expression of rare, large-effect alleles26,93, and low-frequency and rare variants that influence common multifactorial traits94,95. This enables more rigorous evaluation of the contribution of rare and common variants to trait susceptibility48 and supports the enumeration of ‘allelic series’ (sets of alleles of varying frequency, effect size and direction that disrupt the same gene) critical for studies of disease mechanism and therapeutic target optimization89,96. These developments are rapidly converging towards the ultimate destination: a comprehensive matrix of the effect of all observable genetic variants across the widest possible range of cross-sectional and longitudinal biomedical phenotypes. Success in this endeavour depends on ever greater harmonization between, and integration of results from, individual studies through sustained investments in data sharing.

Adding function

From the first linkage maps to whole-genome sequencing of large cohorts, human genetics has deployed increasingly sophisticated and inherently systematic approaches for mapping the genetic factors that underlie traits and diseases. However, progress in determining how these variants influence disease, through systematic interrogation of their functional effects on molecular, cellular and physiological processes, has been far slower.

For monogenic diseases, for which the alleles responsible are typically rare, penetrant and coding, genetic approaches have generally been both necessary and sufficient to implicate a gene as causal28. However, as efforts to elucidate the genetic basis of Mendelian disorders progress towards completion97, functional studies remain important to understand the mechanisms by which disruptive variation within a causal gene leads to disease phenotypes. Unlike common diseases, the clarity of causation for Mendelian disorders usually simplifies the task of generating models (including human cells and organoids or rodents) to connect genotype to organismal phenotype; these have led to many critical insights into the biology of health and disease in humans98,99. In addition, for genes harbouring variants with medically actionable consequences (as with the BRCA1 and BRCA2 mutations that are causal for early-onset breast and ovarian cancer), functional studies can support the translational interpretation of novel alleles identified by medical sequencing29.

For common diseases, functional studies have a more fundamental role. Although tens of thousands of associations have been discovered across thousands of common human diseases and traits44, multiple factors have frustrated efforts to convert these genetic signals to knowledge about causal variants, genes and mechanisms. For the common variants that underlie the bulk of complex-disease risk, the resolution of association mapping is often limited by the haplotype structure of the human genome2,3,4. Furthermore, most GWAS associations map to the noncoding genome and thus lack a direct address to the gene that mediates their effects. Growing appreciation of the pervasive role of pleiotropy complicates matters: many variants identified by GWAS are associated with multiple traits and exert diverse effects across multiple cell types100.

To date, relatively few studies have achieved the goal of connecting variants causal for complex traits to the molecular and cellular functions that mediate that predisposition. One early success described how regulatory variants that modulate SORT1 expression influence low-density lipoprotein cholesterol and myocardial infarction risk101. More recent examples have focused on the relationship between obesity-associated variants intronic to FTO, altered expression of IRX3 and IRX5, and adipocyte102 and hypothalamic103 function. Similar functional descriptions have been reported for individual loci implicated in schizophrenia64, cardiovascular disease104, type 2 diabetes105 and Alzheimer’s disease106, among others.

Over the past decade, the challenge for the functional genomics community has been to convert this ‘one-locus-at-a-time’ workflow to a systematic, multidimensional, integrative approach able to deliver genome-scale functional analyses to match genome-wide variant discovery (Fig. 2). At the molecular level, one cornerstone has been generation of genome-wide catalogues of functional activity. For example, the ENCODE and Roadmap Epigenomics projects have generated maps of histone modifications, transcription-factor binding, chromatin accessibility, three-dimensional genome structure and other regulatory annotations across hundreds of cell types and tissues 107,108. The patterns of genomic overlap between these data and GWAS results enable the functional inference of risk variants, deliver clues to the specific cell types driving disease pathogenesis60,109 and accelerate locus-specific mechanistic insights.

Fig. 2: Genetic discovery is paralleled by advances in functional genomics technologies.
figure 2

Top, the growth in the number of genetic loci associated by GWAS with human traits and diseases (bars) and of variant-to-function studies (area under line, not to scale). Bottom, foundational technological and computational advances over the last decade that enabled (1) development of systematic, genome-wide catalogues of functional elements across multiple cell types and tissues (blue); (2) mapping of QTLs in the context of gene expression, metabolites, proteins and regulatory elements (red); (3) engineering of genes, genetic elements and genetic variation at increasing scale (orange); and (4) systematic tissue-specific surveys of regulatory elements and transcription (grey). scRNA-seq, single-cell RNA-sequencing analysis; ChIA-PET, chromatin interaction analysis by paired-end tag sequencing; ChIP–seq, chromatin immunoprecipitation followed by sequencing; FAIRE-seq, formaldehyde-assisted isolation of regulatory elements with sequencing; DHS-seq, DNase I-hypersensitive sites sequencing; ATAC-seq, assay for transposase-accessible chromatin using sequencing; MPRA, massively parallel reporter assay; STARR-seq, self-transcribing active regulatory region sequencing; CNN: convolutional neural networks. For further details and primary literature on many of these assays, see ref. 173.

In parallel, there has been a scaling of efforts to connect trait-associated regulatory variants to the genes and processes that they regulate in cell types relevant to the disease of interest110,111. For example, the GTEx (Genotype-Tissue Expression) consortium has mapped thousands of expression quantitative trait loci (QTLs) across hundreds of individuals and dozens of tissues112. Further clues to the relationships between regulatory variants and their effector genes can be gathered from DNA proximity assays (such as Hi-C) and single-cell data113 (Fig. 2). Programs such as HubMAP114 and the Human Cell Atlas115 are set to deliver comprehensive, high-resolution reference maps of individual human cell types across diverse developmental stages, providing new opportunities to understand how regulatory genetic variation results in cellular and organismal phenotypes.

Efforts to probe the clinical consequences of coding alleles with large phenotypic effects (particularly null alleles) in humans53,54 and across diverse animal models116 represent powerful strategies for extending functional analyses to the whole-body level. Connections between genetic variation and circulating proteomic and metabolomic data provide additional mechanistic links between cellular events and whole-body physiology79,80. These efforts are paralleled by PheWAS approaches83, which, by mapping variant effects across the range of traits available in biobanks and EMRs, can inform priors for cell types and pathways at individual loci. Importantly, whereas early studies typically linked GWAS risk alleles to data from a single functional assay, the focus is increasingly on maximizing biological insight through the multi-dimensional integration of multiple genome-wide data types using approaches such as heritability partitioning117, functional enrichment analyses60,109, integration of the three-dimensional genome structure118 and deep convolutional neural networks119,120.

Although QTL analyses can implicate a haplotype in a molecular, cellular or organismal phenotype, they are, in isolation, insufficient to define the specific causal variants responsible. To address this, there has been rapid maturation of technologies, such as massively parallel reporter assays121,122,123 and CRISPR genome editing, to support functional characterization of targeted sequence perturbations at scale. Variations on these methods enable the functional evaluation of genes (via knockout screens124), regulatory elements (using CRISPR interference and CRISPR activation screens125,126), and genetic variants (base editors127) at increasing scale and resolution29. Combined with complex readouts—including high-content imaging128 and single-cell transcriptomics and epigenomics129,130—these methods can generate empirical ‘truth’ data, supporting the development of in silico models to predict causal variants, effector transcripts126 and cellular effects. In due course, such models should reduce the need for exhaustive experimental characterization of function for all variants across all cell types.

The goal of such efforts is to enumerate the cascade of molecular events that underlie observed genotype–phenotype associations using physiologically relevant cellular systems (from primary cells to organoids and ‘organ-on-chip’ designs) and whole-body assays appropriate to the disease of interest. Collectively, strategies that offer large-scale functional evaluation of variants and genes of interest will reduce (but probably not eliminate) the intensive effort required for ‘final mile’ validation of disease mechanisms in dedicated systems, thereby accelerating downstream translational application.

Clinical implementation

Medical genetics, as applied to rare diseases, has been characterized by the rapid application in the clinic of the transformative genomic technologies that drove initial research discoveries. There are now targeted genetic tests for nearly all clinical presentations attributable to large-impact alleles, alongside more extensive genome-sequencing assays that, when necessary, enable interrogation of a longer list of relevant genes. Genetic testing for symptomatic individuals and at-risk relatives occurs routinely in many medical specialties. In parallel, the use of somatic cancer testing has increased as therapies targeted to specific mutational events have entered clinical practice (these developments are reviewed elsewhere131,132).

For patients with symptoms that indicate a probable monogenic aetiology (such as retinal degeneration, hearing loss or cardiomyopathy), targeted panels are typically the platform of choice133, although they are increasingly performed on a more extensive sequence backbone. For more complex phenotypes—those without a clear match to a specific syndrome, such as neurodevelopmental disorders and multiple congenital anomalies—testing has gravitated towards early deployment of exome and genome-sequencing platforms that offer speedy resolution of what has historically often been a traumatic diagnostic odyssey15,134. The power of genomic diagnosis is especially clear for those presenting with monogenic neurodevelopmental disorders and critically ill infants135,136. Sequencing of the parent–offspring trio can detect de novo variation in dominant disorders and phase biallelic rare variants in recessive disease13.

The transition from targeted gene tests to genomic sequencing enables recursive reanalysis, including reinterpretation of individual sequences on the basis of subsequent discoveries regarding causal disease alleles and their phenotypic consequences137. However, improved molecular diagnostics are required to ensure reliable detection of a subset of genetic disorders, including those arising from triplet repeats and complex rearrangements138. Deep sequencing of affected tissues for mosaic variants and the use of RNA sequencing to detect noncoding variants that drive early-onset disease (for example, through effects on splicing) represent new fronts for clinical diagnostics30.

Other examples of the rapid adoption of new genomic technologies include noninvasive prenatal testing (more than ten million tests by 2018 across multiple countries139,140,141) and the use of recessive carrier panels for couples planning pregnancies. Newborn screening is now universal in many countries, although it is limited to disorders combining high-throughput low-cost detection with effective early interventions (such as diet restrictions or enzyme replacement)142. Genetic diagnostics are also increasingly applied to newborn screening as a reflex test following an abnormal (for example, metabolic) screening test143. Over the next decade, the repertoire of disorders captured by neonatal screening and prenatal testing is likely to expand markedly. Whereas prenatal testing may be more effective at avoiding disease, the associated ethical issues are more complex144.

Although genetic testing for rare disease and cancer has exploded, there has been more limited uptake of genetic information in other aspects of healthcare. For example, despite multiple examples of clinically important genetic markers related to drug efficacy and side-effect profile145, the roll-out of pharmacogenetics has been hampered by a range of factors, including lack of clinical decision support in electronic medical systems to guide the drug choice or dosing by the physician. This has been compounded by challenges in diagnostic testing: complex haplotype structures and structural variants at some key drug metabolism loci necessitate genome sequencing or specific targeted panels to detect all clinically relevant variants.

For common diseases, translational attention is currently focused on the clinical potential of polygenic risk scores. The development of robust polygenic scores for several common diseases has been catalysed by more precise per-variant effect estimates from larger GWAS datasets, improved algorithms for combining information across millions of single-nucleotide polymorphisms, and large-scale biobanks that support score validation69,146,147. For example, a genome-wide polygenic score for heart attack, incorporating 6.6 million variants, indicates that 5% of European-descent individuals have a risk of future cardiac events equivalent to that seen in those with less frequent monogenic forms of hypercholesterolaemia69. Increasingly, the shift from array-based genotyping to sequence-based analysis is facilitating risk prediction, which integrates information from rare, large-effect alleles with that from polygenic scores93. By improving the capture of genetic risk, particularly in non-European populations, and integrating environmental and biomarker data to quantify aspects of non-genetic risk, it should be possible to achieve increasingly accurate prediction of individual disease risk, and to use this information to tailor screening, prevention and treatment. Success will depend on developing models of risk that robustly integrate these diverse data types and on optimizing the strategies deployed to ensure effective implementation.

The absence of evidence-based guidelines to support healthcare recommendations continues to hinder the clinical applications of genetic data. In some countries, this is compounded by confusion over reimbursement and disparities in testing across society148. Many healthcare professionals lack experience in genomic medicine and need education and guidance to practice in the rapidly evolving space of genetic and genomic testing149. One consequence of these difficulties has been an expanding direct-to-consumer testing market, variably controlled by country-specific regulations150, which is moving beyond a focus on ancestry and personal traits, towards models in which individuals have direct access to ordering physicians and genetic counselors151. The risk of commercial influence in this model remains high. There are concerns about the consequences of unfettered release of genetic data of dubious or inflated clinical relevance, and limited infrastructure to pull these results into mainstream medical systems.

These advances have fostered debate about the value of genetics for population screening, for both monogenic and complex disorders. Population screening for monogenic disorders is most likely to be initiated for conditions for which risk estimates are well-understood and there are actionable interventions (for example, Lynch syndrome and familial hypercholesterolaemia). Expansion to other disorders requires better understanding of the penetrance of pathogenic alleles in unselected populations152 and caution before extending screening to longer lists of genes that are less securely implicated in disease causation153. As certain countries consider universal capture of genome-wide genetic data at birth or later in life, key questions concern the strategies for releasing this information to citizens and their medical teams to support individual healthcare.

Ultimately, barriers to genomic medicine are most directly overcome by demonstrating clinical utility in disease management and therapeutic decision-making, with evidence for improved patient outcomes. Hereditary cancers provide multiple examples, such as the use of BRCA1/BRCA2 testing to inform PARP inhibitor treatment in patients with cancer154. There is a growing list of diseases for which a molecular diagnosis results in specific interventions designed to improve patient outcomes (https://www.ncbi.nlm.nih.gov/books/NBK1116/) (some examples are listed in Table 1), and there are currently more than 50 FDA-approved drugs for genetic disorders155. Although gene therapy has been slow to evolve since its early introduction, recent advances in gene editing are reinvigorating approaches to treat disorders by manipulation of the underlying genetic defects156.

Table 1 Examples showing progress in the clinical utility of genetic testing

Looking forward

Over the coming decade, the challenge will be to optimize and to implement at scale, strategies that use human genetics to further the understanding of health and disease, and to maximize the clinical benefit of those discoveries. Realizing these goals will require the concerted effort of researchers in academia and industry to bring about transformational change across a range of highly interconnected domains, for example, through the auspices of the recently established International Common Disease Alliance (https://www.icda.bio). Such efforts will be directed towards establishing: (a) comprehensive inventories of genotype–phenotype relationships across populations and environments; (b) systematic assays of variant- and gene-level function across cell types, states and exposures; (c) improved scalable strategies for turning this basic knowledge into fully developed molecular, cellular and physiological models of disease pathogenesis; and (d) application of those biological insights to drive novel preventative and therapeutic options.

The first of these will involve documenting the full spectrum of natural genetic variation across all human populations, including capture of structural variants, and somatic mutations that accumulate with aging157,158, and associating these variations with the ever-richer disease-related intermediate and clinical traits available through biobanks and electronic health records. It will be particularly important to include populations historically under-represented in genomic research, following the pioneering work of the H3Africa consortium159. As over time, clinically sequenced genomes will outnumber those collected in academia, research and healthcare communities will need to develop a harmonized approach to genomics to transcend historical boundaries. Progress will be critically dependent on platforms and governance that lower barriers to the integration of genetic and phenotypic data across studies and countries, along with technical standards that are reliable, secure and compatible with the international regulatory landscape160.

Mechanistic interpretation of genetic associations, particularly those in regulatory regions, will be driven by the systematic annotation of sequence variants and genes for functional impact across disease-relevant cell types, enabling mapping of processes contributing to disease development with respect to place (tissue and cell type), time (developmental stage) and context (external influences)161. Accelerating efforts to characterize the cellular composition of tissues through single-cell assays115 will increase the granularity of these observations. Large-scale perturbation studies across diverse cellular and animal models will, together with analyses of coding variants in humans53,54, provide confidence in causal inference. Large-scale proteomic and metabolomic analyses (in tissues and biological fluids) will provide a bridge to downstream pathways79,80. Research access to such functional data, generated at scale, should lower the barriers to mechanistic inference, provide system-wide context and enable researchers to focus wet-laboratory validation on the most critical experiments. Collectively, these efforts will support compilation of a systematic catalogue of key networks and processes that influence normal physiology and disease development and inform a revised molecular taxonomy of disease.

This knowledge will reinforce the essential contribution of human genetics to the identification and prioritization of targets for therapeutic development89,162. Insights into the efficacy of target perturbation and potential for adverse events, allied to characterization of translatable biomarkers, provide ways to boost the efficiency of drug-development pipelines162. Given the clinical importance of slowing disease progression163, target-discovery efforts will increasingly need to embrace the genetics of disease progression and treatment response, as these may involve processes distinct from those captured by studies of disease onset.

In parallel, the clinical use of human genetics will benefit from progress towards universal determination of individual genome sequences built through a combination of biobank expansion and direct access within healthcare systems. This will power clinical applications that extend beyond the current focus on neonatal sequencing, Mendelian diagnostics and somatic tumour sequencing164. In particular, improvements in polygenic score derivation will boost risk prediction for multifactorial traits, provide a molecular basis for disease classification, support biomarker discovery and therapeutic optimization and contribute to understanding of the variable penetrance of monogenic conditions69. Implementing genomic medicine as a routine component of clinical care across diverse healthcare environments will inevitably require investment in the training of healthcare professionals and attention to optimal strategies for returning genetic findings to patients.

The limited heritability of many multifactorial traits constrains the clinical precision available from genetic data alone. This will drive efforts to integrate information on personal environment, lifestyle and behaviour, and to combine prognostic, predictive information on disease risk with longitudinal measures of molecular and clinical state that track an individual’s journey from health to disease. Human genetics will also, given its unique potential for causal inference, support identification of the non-genetic risk factors (often modifiable) that directly contribute to disease predisposition and development165. As polygenic score performance improves, analysis of individuals who show marked divergence between genetic predisposition and real-world clinical outcomes should define exposures (such as lifestyle choices or gut microbiome) the contribution of which to disease causation remains unclear166.

Collectively, these developments can be expected to accelerate personalization of healthcare delivery. Provided costs are sustainable, a more preventative perspective on health could emerge, managed through proactive genomic, clinical and lifestyle surveillance using risk scores, complex biomarkers, liquid biopsies and wearables. Improved understanding of aetiological heterogeneity, patterns of sharing of genetic risk across diseases, variation in therapeutic response and risk of adverse events will enhance targeting of preventative and therapeutic interventions167. At the population level, intervention strategies will seek to combine population-wide and targeted strategies to best effect168. It will be critical to ensure that these benefits are available to as many as possible, so that genomics reduces, rather than exacerbates, national and global health disparities55,169 (Box 1).

The developments described above, represent variations on the theme of ‘reading’ the genome. The emerging capacity to block this reading (for example, through siRNA therapies170) or even to ‘write’ the genome (through CRISPR editing) promises to be equally transformative, providing new opportunities to correct, and even cure, Mendelian disease. Spectacular advances in developing novel therapeutic strategies are likely for many diseases, based, for example, on ex vivo cellular manipulation171 or in vivo somatic cell editing172.

Importantly, developments in genomic medicine need to proceed in a bioethical framework for research and clinical use that recognizes the personal relevance of human genetics and the critical importance of autonomous consent and the protection of privacy, while minimizing the adverse consequences of genetic exceptionalism. Governance needs to reaffirm the rights of citizens to make individual contributions to scientific progress through research participation and encourage the responsible exchange of data for clinical and research purposes.

Future prospects

Over the past two decades, understanding of the genetic basis of human disease has been transformed by a combination of spectacular technological and analytical advances, collaborative commitment to the development of foundational resources and the collection and analysis of vast amounts of genetic, molecular and clinical data. The biological insights derived from these data are, increasingly, drivers of translational innovation, and widening personal access to large-scale genetic and molecular data promises to reshape medical care.

However, for the full potential of genomic medicine to be realized, there will need to be sustained collaborative endeavour on several fronts to ensure that the capacity to generate ever more detailed maps of the relationships between sequence variation and biomedical phenotypes delivers a comprehensive understanding of disease mechanisms that can be translated into the medicines of tomorrow.