Abstract
Over the last decade next generation sequencing (NGS) has been extensively used to identify new pathogenic mutations and genes causing rare genetic diseases. The efficient analyses of NGS data is not trivial and requires a technically and biologically rigorous pipeline that addresses data quality control, accurate variant filtration to minimize false positives and false negatives, and prioritization of the remaining genes based on disease genomics and physiological knowledge. This review provides a pipeline including all these steps, describes popular software for each step of the analysis, and proposes a general framework for the identification of causal mutations and genes in individual patients of rare genetic diseases.
Similar content being viewed by others
References
Adzhubei I, Jordan DM, Sunyaev SR (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protocols Human Genet. https://doi.org/10.1002/0471142905.hg0720s76
Amberger J, Bocchini C, Hamosh A (2011) A new face and new challenges for online mendelian inheritance in man (OMIM(R)). Hum Mutat 32:564–567. https://doi.org/10.1002/humu.21466
Auton A et al (2015) A global reference for human genetic variation. Nature 526:68–74. https://doi.org/10.1038/nature15393
Boisson-Dupuis S et al (2018) Tuberculosis and impaired IL–dependent IFN-gamma immunity in humans homozygous for a common TYK missense variant. Sci Immunol. https://doi.org/10.1126/sciimmunol.aau8714
Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE (2013) Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nature Rev Genet 14:681–691. https://doi.org/10.1038/nrg3555
Brookes AJ, Robinson PN (2015) Human genotype-phenotype databases: aims, challenges and opportunities. Nature Rev Genet 16:702–715. https://doi.org/10.1038/nrg3932
Carmi S et al (2014) Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European origins. Nature Commun 5:4835
Chen J, Bardes EE, Aronow BJ, Jegga AG (2009) ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 37:W305–W311. https://doi.org/10.1093/nar/gkp427
Cingolani P et al (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 Fly (Austin) 6:80–92 10.4161/fly.19695
Collins RL et al (2019) An open resource of structural variation for medical and population genetics. bioRxiv:578674 10.1101/578674
Consortium GT (2013) The Genotype-Tissue Expression (GTEx) project. Nature Genet 45:580–585. https://doi.org/10.1038/ng.2653
da Huang W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57. https://doi.org/10.1038/nprot.2008.211
Dunham I et al (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74. https://doi.org/10.1038/nature11247
Fernandez-Marmiesse A, Gouveia S, Couce ML (2018) NGS technologies as a turning point in rare disease research. Diagnos Treatment Curr Med Chem 25:404–432. https://doi.org/10.2174/0929867324666170718101946
Gibbs RA et al (2003) The International HapMap Project Nature 426:789–796. https://doi.org/10.1038/nature02168
Gloss BS, Dinger ME (2018) Realizing the significance of noncoding functionality in clinical genomics. Experimen Molec Med 50:97. https://doi.org/10.1038/s12276-018-0087-0
Graf von der Schulenburg JM (2015) Frank M Rare is frequent and frequent is costly: rare diseases as a challenge for health care systems. Eur J Health Econom 16:113–118. https://doi.org/10.1007/s10198-014-0639-8
Greene CS et al (2015) Understanding multicellular function and disease with human tissue-specific networks. Nature Genet 47:569–576. https://doi.org/10.1038/ng.3259
Gussow AB et al (2017) Orion: detecting regions of the human non-coding genome that are intolerant to variation using population genetics. PLoS ONE 12:e0181604. https://doi.org/10.1371/journal.pone.0181604
Hoefele J et al (2007) Evidence of Oligogenic Inheritance in Nephronophthisis. J Am Soc Nephrol 18:2789. https://doi.org/10.1681/ASN.2007020243
Ionita-Laza I, McCallum K, Xu B, Buxbaum JD (2016) A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nature Genet 48:214–220. https://doi.org/10.1038/ng.3477
Itan Y et al (2013) The human gene connectome as a map of short cuts for morbid allele discovery. Proc Natl Acad Sci U S A 110:5558–5563. https://doi.org/10.1073/pnas.1218167110
Itan Y et al (2014) HGCS: an online tool for prioritizing disease-causing gene variants by biological distance. BMC Genomics 15:256. https://doi.org/10.1186/1471-2164-15-256
Itan Y et al (2015) The human gene damage index as a gene-level approach to prioritizing exome variants. Proceed National Academy Sci 112:13615. https://doi.org/10.1073/pnas.1518646112
Itan Y et al (2016) The mutation significance cutoff: gene-level thresholds for variant predictions. Nat Methods 13:109–110. https://doi.org/10.1038/nmeth.3739
Itan Y, Casanova JL (2015) Novel primary immunodeficiency candidate genes predicted by the human gene connectome. Front Immunol 6:142. https://doi.org/10.3389/fimmu.2015.00142
Jackson M, Marks L, May GHW, Wilson JB (2018) The genetic basis of disease. Essays Biochem 62:643–723. https://doi.org/10.1042/EBC20170053
Karczewski KJ et al (2019) Variation across human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. biorxiv. https://doi.org/10.1101/531210
Katsanis N et al (2001) Triallelic Inheritance in Bardet-Biedl Syndrome, a Mendelian Recessive Disorder Science 293:2256 10.1126/science.1063525
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12:996–1006
Khurana E, Fu Y, Chen J, Gerstein M (2013) Interpretation of genomic variants using a unified biological network approach. PLOS Comput Biol 9:e1002886. https://doi.org/10.1371/journal.pcbi.1002886
Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genet 46:310–315. https://doi.org/10.1038/ng.2892
Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y (2019) Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol 20:117. https://doi.org/10.1186/s13059-019-1720-5
Kuleshov MV et al (2016) Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res 44:W90–W97. https://doi.org/10.1093/nar/gkw377
Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4:1073–1081. https://doi.org/10.1038/nprot.2009.86
Kundaje A et al (2015) Integrative analysis of 111 reference human epigenomes. Nature 518:317–330. https://doi.org/10.1038/nature14248
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR (2014) ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. https://doi.org/10.1093/nar/gkt1113
Lappalainen T, Scott AJ, Brandt M, Hall IM (2019) Genomic analysis in the age of human genome sequencing. Cell 177:70–84. https://doi.org/10.1016/j.cell.2019.02.032
Lee S et al (2012) Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet 91:224–237. https://doi.org/10.1016/j.ajhg.2012.06.007
Lee S, Abecasis Gonçalo R, Boehnke M, Lin X (2014) Rare-variant association analysis: study designs and statistical tests. Am J Human Genet 95:5–23. https://doi.org/10.1016/j.ajhg.2014.06.009
Lek M et al (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285–291. https://doi.org/10.1038/nature19057
Li H et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM arXiv preprint arXiv:13033997
Liang D, Leung RK-K, Guan W, Au WW (2018) Involvement of gut microbiome in human health and disease: brief overview, knowledge gaps and research opportunities. Gut Pathog 10:3–3. https://doi.org/10.1186/s13099-018-0230-4
Lizio M et al (2015) Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol 16:22. https://doi.org/10.1186/s13059-014-0560-6
Maffucci P et al (2019) Blacklisting variants common in private cohorts but not in public databases optimizes human exome analysis. Proc Nation Acad Sci 116:950–959. https://doi.org/10.1073/pnas.1808403116
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26:2867–2873. https://doi.org/10.1093/bioinformatics/btq559
Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nature Rev Genet 11:499–511
Maroilley T, Tarailo-Graovac M (2019) Uncovering Missing Heritability in Rare Diseases Genes (Basel) 10:275. https://doi.org/10.3390/genes10040275
McLaren W et al (2016) The Ensembl Variant Effect Predictor Genome Biology 17:122. https://doi.org/10.1186/s13059-016-0974-4
Moutsianas L et al (2015) The power of gene-based rare variant methods to detect disease-associated variation and test hypotheses about complex disease PLOS. Genetics 11:e1005165. https://doi.org/10.1371/journal.pgen.1005165
Petrovski S, Wang Q, Heinzen EL, Allen AS, Goldstein DB (2013) Genic intolerance to functional variation and the interpretation of personal genomes PLOS. Genetics 9:e1003709. https://doi.org/10.1371/journal.pgen.1003709
Posey JE (2019) Genome sequencing and implications for rare disorders. Orphanet J Rare Dis 14:153. https://doi.org/10.1186/s13023-019-1127-0
Purcell S et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. https://doi.org/10.1086/519795
Rath A, Olry A, Dhombres F, Brandt MM, Urbero B, Ayme S (2012) Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum Mutat 33:803–808. https://doi.org/10.1002/humu.22078
Rehm HL et al (2015) ClinGen — The Clinical Genome Resource New England Journal of Medicine 372:223–52242 10.1056/NEJMsr1406261
Requena D et al(2018) CDG: An Online Server for Detecting Biologically Closest Disease-Causing Genes and its Application to Primary Immunodeficiency Frontiers in immunology 9:1340–1340 10.3389/fimmu.2018.01340
Richards S et al (2015) Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17:405–423. https://doi.org/10.1038/gim.2015.30
Ritchie GR, Dunham I, Zeggini E, Flicek P (2014) Functional annotation of noncoding sequence variants. Nat Methods 11:294–296. https://doi.org/10.1038/nmeth.2832
Samocha KE et al (2014) A framework for the interpretation of de novo mutation in human disease. Nature Genet 46:944–950. https://doi.org/10.1038/ng.3050
Sevim Bayrak C, Zhang P, Tristani-Firouzi M, Gelb BD, Itan Y (2020) De novo variants in exomes of congenital heart disease patients identify risk genes and pathways. Genome Med 12:9. https://doi.org/10.1186/s13073-019-0709-8
Shefchek KA et al (2020) The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res 48:D704–d715. https://doi.org/10.1093/nar/gkz997
Silverman EK, Allard P, Loscalzo J, Mulvihill JJ, Korrick SA, Network TUD (2019) Reported environmental exposures are inversely associated with obtaining a genetic diagnosis in the Undiagnosed Diseases Network. Am J Med Genet Part A 179:958–965. https://doi.org/10.1002/ajmg.a.61132
Smedley D et al (2015) Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc 10:2004
Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A (2009) BioMart–biological queries made easy. BMC Genomics 10:22. https://doi.org/10.1186/1471-2164-10-22
Stelzer G et al (2016) The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Current Protocols in Bioinformatics 54:1.30.31–31.30.33. https://doi.org/10.1002/cpbi.5
Stenson PD et al (2017) The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet 136:665–677. https://doi.org/10.1007/s00439-017-1779-6
Sudmant PH et al (2015) An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81. https://doi.org/10.1038/nature15394
Szklarczyk D et al (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43:D447–452. https://doi.org/10.1093/nar/gku1003
Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D (2019) Benefits and limitations of genome-wide association studies. Nature Rev Genet 20:467–484. https://doi.org/10.1038/s41576-019-0127-1
Telenti A et al (2016) Deep sequencing of 10,000 human genomes. Proc Nation Acade Sci 113:11901. https://doi.org/10.1073/pnas.1613365113
Thornton T et al (2014) Estimating and adjusting for ancestry admixture in statistical methods for relatedness inference, heritability estimation, and association testing. BMC Proceed 8:S5. https://doi.org/10.1186/1753-6561-8-s1-s5
Thornton TA, Bermejo JL (2014) Local and global ancestry inference and applications to genetic association analysis for admixed populations. Genet Epidemiol 38(Suppl 1):S5–S12. https://doi.org/10.1002/gepi.21819
Thorvaldsdóttir H, Robinson JT, Mesirov JP (2013) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14:178–192. https://doi.org/10.1093/bib/bbs017
Van der Auwera GA et al (2013) From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Current Protocols in Bioinformatics 43:11.10.11–11.10.33. https://doi.org/10.1002/0471250953.bi1110s43
Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38:e164–e164. https://doi.org/10.1093/nar/gkq603
Whiffin N et al (2017) Using high-resolution variant frequencies to empower clinical genome interpretation. Genet Med 19:1151–1158. https://doi.org/10.1038/gim.2017.26
Zhang P et al (2018) PopViz: a webserver for visualizing minor allele frequencies and damage prediction scores of human genetic variations. Bioinformatics 34:4307–4309. https://doi.org/10.1093/bioinformatics/bty536
Zhu Y, Tazearslan C, Suh Y (2017) Challenges and progress in interpretation of non-coding genetic variants associated with human disease. Exp Biol Med (Maywood) 242:1325–1334. https://doi.org/10.1177/1535370217713750
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sevim Bayrak, C., Itan, Y. Identifying disease-causing mutations in genomes of single patients by computational approaches. Hum Genet 139, 769–776 (2020). https://doi.org/10.1007/s00439-020-02179-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-020-02179-7