Skip to main content
Log in

A population-based approach for gene prioritization in understanding complex traits

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Gene prioritization is the process of determining which variants and genes identified in genetic analyses are likely to cause a disease or a variation in a phenotype. For many genes, neither in vitro nor in vivo testing is available, thus assessing their pathogenic role could be challenging, leading to false-positive or false-negative results. In this paper, we propose an innovative score of gene prioritization based on the population of interest. We introduce the concept of singleton-cohort variants (SC variant), a variant that has allele count equal to one in the cohort under study. The difference between the normalized count of SC variants in the coding region and the normalized count of SC variants in the non-coding region should give a hint regarding the level of constraints for that gene in a specific population. This scoring system is negative when there are constraints that allow the presence of SC variants only in the non-coding region; on the contrary, it is positive when there are no constraints. A complimentary score is the sum of SC variants normalized count in both coding and non-coding regions, which could be used as a proxy of positive or strong purifying selection in a specific population. Our methodology showed a high level of constraining for genes such as USP34 in all subpopulations tested (1000 G dataset). In contrast, some genes showed a high negative score only in specific populations, e.g., MYT1L in Europeans, UBR5 in East Asians, and FBXO11 in Africans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Abbreviations

SC variants:

Singleton-cohort variants

SC_cds:

Normalized singleton-cohort variant count in the gene coding region

SC_ncds:

Normalized singleton-cohort variant count in the gene coding region

DSC score:

Delta singleton-cohort variant score

SSC score:

Sum singleton-cohort variant score

References

  • Astle WJ, Elding H, Jiang T et al (2016) The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167:1415–1429

    Article  CAS  Google Scholar 

  • Ayub Q, Yngvadottir B, Chen Y et al (2013) FOXP2 targets show evidence of positive selection in European populations. Am J Hum Genet 92:696–706

    Article  CAS  Google Scholar 

  • Bersaglieri T, Sabeti PC, Patterson N et al (2004) Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74:1111–1120

    Article  CAS  Google Scholar 

  • Blanchet P, Bebin M, Bruet S et al (2017) MYT1L mutations cause intellectual disability and variable obesity by dysregulating gene expression and development of the neuroendocrine hypothalamus. PLoS Genet 13:e1006957

    Article  Google Scholar 

  • Blomen VA, Májek P, Jae LT et al (2015) Gene essentiality and synthetic lethality in haploid human cells. Science 350:1092–1096

    Article  CAS  Google Scholar 

  • Booker TR, Jackson BC, Keightley PD (2017) Detecting positive selection in the genome. BMC Biol 15:98

    Article  Google Scholar 

  • Consortium GP (2015) A global reference for human genetic variation. Nature 526:68

    Article  Google Scholar 

  • Cooper GM, Shendure J (2011) Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12:628

    Article  CAS  Google Scholar 

  • Davydov EV, Goode DL, Sirota M et al (2010) Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol 6:e1001025

    Article  Google Scholar 

  • de la Hoya M, Fernández JM, Tosar A et al (2003) Association between BRCA1 mutations and ratio of female to male births in offspring of families with breast cancer, ovarian cancer, or both. JAMA 290:929–931

    Article  Google Scholar 

  • Field Y, Boyle EA, Telis N et al (2016) Detection of human adaptation during the past 2000 years. Science 354:760–764

    Article  CAS  Google Scholar 

  • Frankish A, Diekhans M, Ferreira A-M et al (2018) GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47:D766–D773

    Article  Google Scholar 

  • Glaab E, Baudot A, Krasnogor N et al (2012) EnrichNet: network-based gene set enrichment analysis. Bioinformatics 28:i451–i457

    Article  CAS  Google Scholar 

  • Golan D, Lander ES, Rosset S (2014) Measuring missing heritability: inferring the contribution of common variants. Proc Natl Acad Sci 111:E5272–E5281

    Article  CAS  Google Scholar 

  • Grau J, Grosse I, Keilwagen J (2015) PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31:2595–2597

    Article  CAS  Google Scholar 

  • Guo Y, Wang M, Zhang S, et al (2018) Ubiquitin‐specific protease USP34 controls osteogenic differentiation and bone formation by regulating BMP2 signaling. EMBO J 37.

  • Havrilla JM, Pedersen BS, Layer RM, Quinlan AR (2019) A map of constrained coding regions in the human genome. Nat Genet 51:88–95

    Article  CAS  Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: A conditional inference framework. J Comput Graph Stat 15:651–674

    Article  Google Scholar 

  • Karssen LC, van Duijn CM, Aulchenko YS (2016) The GenABEL Project for statistical genomics. F1000Research. https://doi.org/10.12688/f1000research.8733.1

    Article  Google Scholar 

  • Lee JJ, Wedow R, Okbay A et al (2018) Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet 50:1112

    Article  CAS  Google Scholar 

  • Lek M, Karczewski KJ, Minikel EV et al (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285

    Article  CAS  Google Scholar 

  • Lou DI, McBee RM, Le UQ et al (2014) Rapid evolution of BRCA1 and BRCA2 in humans and other primates. BMC Evol Biol 14:155

    Article  Google Scholar 

  • Luo W, Brouwer C (2013) Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29:1830–1831

    Article  CAS  Google Scholar 

  • McCarthy DJ, Humburg P, Kanapin A et al (2014) Choice of transcripts and software has a large effect on variant annotation. Genome Med 6:26

    Article  Google Scholar 

  • Meissner B, Kridel R, Lim RS et al (2013) The E3 ubiquitin ligase UBR5 is recurrently mutated in mantle cell lymphoma. Blood 121:3161–3164

    Article  CAS  Google Scholar 

  • Oktay K, Kim JY, Barad D, Babayev SN (2010) Association of BRCA1 mutations with occult primary ovarian insufficiency: a possible explanation for the link between infertility and breast/ovarian cancer risks. J Clin Oncol 28:240

    Article  CAS  Google Scholar 

  • Perry GH, Dominy NJ, Claw KG et al (2007) Diet and the evolution of human amylase gene copy number variation. Nat Genet 39:1256

    Article  CAS  Google Scholar 

  • Petrovski S, Gussow AB, Wang Q et al (2015) The intolerance of regulatory sequence to genetic variation predicts gene dosage sensitivity. PLoS Genet 11:e1005492

    Article  Google Scholar 

  • Petrovski S, Wang Q, Heinzen EL et al (2013) Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet 9:e1003709

    Article  CAS  Google Scholar 

  • Samocha KE, Kosmicki JA, Karczewski KJ, et al (2017) Regional missense constraint improves variant deleteriousness prediction. BioRxiv 148353

  • Samuels Y, Velculescu VE (2004) Oncogenic mutations of PIK3CA in human cancers. Cell Cycle 3:1221–1224

    Article  CAS  Google Scholar 

  • Schneider C, Kon N, Amadori L et al (2016) FBXO11 inactivation leads to abnormal germinal-center formation and lymphoproliferative disease. Blood 128:660–666

    Article  CAS  Google Scholar 

  • Shi H, Kichaev G, Pasaniuc B (2016) Contrasting the genetic architecture of 30 complex traits from summary association data. Am J Hum Genet 99:139–153

    Article  CAS  Google Scholar 

  • Tennessen JA, Bigham AW, O’Connor TD et al (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337:64–69

    Article  CAS  Google Scholar 

  • Weiss K, Lazar HP, Kurolap A et al (2019) The CHD4-related syndrome: a comprehensive investigation of the clinical spectrum, genotype–phenotype correlations, and molecular basis. Genet Med. https://doi.org/10.1038/s41436-019-0612-0

    Article  PubMed  PubMed Central  Google Scholar 

  • Yang G, Wang X, Liu B et al (2019) circ-BIRC6, a circular RNA, promotes hepatocellular carcinoma progression by targeting the miR-3918/Bcl2 axis. Cell Cycle 18:976–989

    Article  CAS  Google Scholar 

Download references

Acknowledgements

A sincere thank you to Veronika Collovati and Eleonora Bernucci for proofreading this manuscript. We would like to thank the reviewers for their insightful comments.

Funding

This research was funded by the Italian Ministry of Health (5 × 1000 to Institute for Maternal and Child Health IRCCS “Burlo Garofolo”). The funders had no role in the design of the study, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimo Mezzavilla.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mezzavilla, M., Cocca, M., Guidolin, F. et al. A population-based approach for gene prioritization in understanding complex traits. Hum Genet 139, 647–655 (2020). https://doi.org/10.1007/s00439-020-02152-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-020-02152-4

Navigation