Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

DNA methylation-based predictors of health: applications and statistical considerations

Abstract

DNA methylation data have become a valuable source of information for biomarker development, because, unlike static genetic risk estimates, DNA methylation varies dynamically in relation to diverse exogenous and endogenous factors, including environmental risk factors and complex disease pathology. Reliable methods for genome-wide measurement at scale have led to the proliferation of epigenome-wide association studies and subsequently to the development of DNA methylation-based predictors across a wide range of health-related applications, from the identification of risk factors or exposures, such as age and smoking, to early detection of disease or progression in cancer, cardiovascular and neurological disease. This Review evaluates the progress of existing DNA methylation-based predictors, including the contribution of machine learning techniques, and assesses the uptake of key statistical best practices needed to ensure their reliable performance, such as data-driven feature selection, elimination of data leakage in performance estimates and use of generalizable, adequately powered training samples.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Illustrative distribution of application areas for studies of DNA methylation (DNAm) prediction of health risk factors and exposures by DNA source tissue for the majority of relevant studies published before April 2020.
Fig. 2: Illustrative distributions for studies of DNA methylation (DNAm) prediction of health outcomes for the majority of relevant studies published before April 2020.

Similar content being viewed by others

References

  1. GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet 396, 1204–1222 (2020).

    Article  Google Scholar 

  2. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ~700 000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Khera, A. V. et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596.e9 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Roberts, N. J. et al. The predictive capacity of personal genome sequencing. Sci. Transl. Med. 4, 133ra58 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Adeyemo, A. et al. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat. Med. 27, 1876–1884 (2021).

    Article  CAS  Google Scholar 

  7. Timpson, N. J., Greenwood, C. M. T., Soranzo, N., Lawson, D. J. & Richards, J. B. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat. Rev. Genet. 19, 110–124 (2017).

    Article  PubMed  CAS  Google Scholar 

  8. Ala-Korpela, M. & Holmes, M. V. Polygenic risk scores and the prediction of common diseases. Int. J. Epidemiol. 49, 1–3 (2020).

    Article  PubMed  Google Scholar 

  9. Cavalli, G. & Heard, E. Advances in epigenetics link genetics to the environment and disease. Nature 571, 489–499 (2019).

    Article  CAS  PubMed  Google Scholar 

  10. Teschendorff, A. E. et al. Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 20, 440–446 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Petronis, A. Epigenetics as a unifying principle in the aetiology of complex traits and diseases. Nature 465, 721–727 (2010).

    Article  CAS  PubMed  Google Scholar 

  12. Baubec, T. & Schübeler, D. Genomic patterns and context specific interpretation of DNA methylation. Curr. Opin. Genet. Dev. 25, 85–92 (2014).

    Article  CAS  PubMed  Google Scholar 

  13. Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6–21 (2002).

    Article  CAS  PubMed  Google Scholar 

  14. Kim, M. & Costello, J. DNA methylation: an epigenetic mark of cellular memory. Exp. Mol. Med. 49, 49 (2017).

    Article  Google Scholar 

  15. Russo, V. E. A., Martienssen, R. A. & Riggs, A. D. Epigenetic Mechanisms of Gene Regulation (Cold Spring Harbor laboratory Press, 1996).

  16. Lappalainen, T. & Greally, J. M. Associating cellular epigenetic models with human phenotypes. Nat. Rev. Genet. 18, 441–451 (2017).

    Article  CAS  PubMed  Google Scholar 

  17. Hou, L., Zhang, X., Wang, D. & Baccarelli, A. Environmental chemical exposures and human epigenetics. Int. J. Epidemiol. 41, 79–105 (2012).

    Article  PubMed  Google Scholar 

  18. Perera, F. & Herbstman, J. Prenatal environmental exposures, epigenetics, and disease. Reprod. Toxicol. 31, 363–373 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Laird, P. W. Principles and challenges of genomewide DNA methylation analysis. Nat. Rev. Genet. 11, 191–203 (2010).

    Article  CAS  PubMed  Google Scholar 

  20. Foley, D. L. et al. Prospects for epigenetic epidemiology. Am. J. Epidemiol. 169, 389–400 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Moran, S., Arribas, C. & Esteller, M. Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences. Epigenomics 8, 389–399 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Sandoval, J. et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 6, 692–702 (2011).

    Article  CAS  PubMed  Google Scholar 

  23. Horvath, S. & Raj, K. DNA methylation-based biomarkers and the epigenetic clock theory of ageing. Nat. Rev. Genet. 19, 371–384 (2018).

    Article  CAS  PubMed  Google Scholar 

  24. Bell, C. G. et al. DNA methylation aging clocks: challenges and recommendations. Genome Biol. 20, 249 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  25. McRae, A. F. et al. Contribution of genetic variation to transgenerational inheritance of DNA methylation. Genome Biol. 15, R73 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. McCartney, D. L. et al. Epigenetic prediction of complex traits and death. Genome Biol. 19, 136 (2018). This paper systematically demonstrates that DNAm could predict a whole range of risk factors and exposures, with explanatory capacity roughly equal to or better than polygenic risk predictors.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, R115 (2013). This early epigenetic clock is broadly applicable owing to its multi-tissue training set and accordingly saw widespread use as a biomarker of biological ageing in many epidemiological studies.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Bocklandt, S. et al. Epigenetic predictor of age. PLoS ONE 6, e14821 (2011). This is the first paper to report a DNAm predictor of age, or epigenetic clock.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell 49, 359–367 (2013).

    Article  CAS  PubMed  Google Scholar 

  30. Crimmins, E. M., Thyagarajan, B., Levine, M. E., Weir, D. R. & Faul, J. Associations of age, sex, race/ethnicity and education with 13 epigenetic clocks in a nationally representative US sample: the Health and Retirement Study. J. Gerontol. Ser. A biol. Sci. Med. Sci. 76, 1117–1123 (2021).

    Article  Google Scholar 

  31. Rakyan, V. K. et al. Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains. Genome Res. 20, 434–439 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Boks, M. P. et al. Longitudinal changes of telomere length and epigenetic age related to traumatic stress and post-traumatic stress disorder. Psychoneuroendocrinology 51, 506–512 (2015).

    Article  CAS  PubMed  Google Scholar 

  33. Zannas, A. S. et al. Lifetime stress accelerates epigenetic aging in an urban, African American cohort: relevance of glucocorticoid signaling. Genome Biol. 16, 266 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Horvath, S. et al. Obesity accelerates epigenetic aging of human liver. Proc. Natl Acad. Sci. USA 111, 15538–15543 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Marioni, R. E. et al. The epigenetic clock is correlated with physical and cognitive fitness in the Lothian Birth Cohort 1936. Int. J. Epidemiol. 44, 1388–1396 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Levine, M. E. et al. DNA methylation age of blood predicts future onset of lung cancer in the women’s health initiative. Aging 7, 690–700 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Marioni, R. E. et al. DNA methylation age of blood predicts all-cause mortality in later life. Genome Biol. 16, 25 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Marioni, R. E. et al. The epigenetic clock and telomere length are independently associated with chronological age and mortality. Int. J. Epidemiol. 45, 424–432 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Jaffe, A. E. & Irizarry, R. A. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 15, R31 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Horvath, S. & Ritz, B. R. Increased epigenetic age and granulocyte counts in the blood of Parkinson’s disease patients. Aging 7, 1130–1142 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Zhang, Q. et al. Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing. Genome Med. 11, 887–897 (2019).

    Article  Google Scholar 

  42. Levine, M. E. et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging 10, 573–591 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Lu, A. T. et al. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging 11, 303 (2019). This paper presents an influential second-generation epigenetic clock and demonstrates that DNAm predictors of molecular phenotypes, risk factors and exposures can be usefully combined.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Belsky, D. W. W. et al. Quantification of the pace of biological aging in humans through a blood test, the DunedinPoAm DNA methylation algorithm. eLife 9, e54870 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Lu, A. T. et al. GWAS of epigenetic aging rates in blood reveals a critical role for TERT. Nat. Commun. 9, 387 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Gibson, J. et al. A meta-analysis of genome-wide association studies of epigenetic age acceleration. PLoS Genet. 15, e1008104 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. McCartney, D. L. et al. Genome-wide association studies identify 137 genetic loci for DNA methylation biomarkers of aging. Genome Biol. 22, 1–25 (2021).

    Article  CAS  Google Scholar 

  48. Vetter, V. M. et al. Epigenetic clock and relative telomere length represent largely different aspects of aging in the Berlin aging study II (BASE-II). J. Gerontol. A Biol. Sci. Med. Sci. 74, 27–32 (2019).

    Article  CAS  PubMed  Google Scholar 

  49. Joehanes, R. et al. Epigenetic signatures of cigarette smoking. Circ. Cardiovasc. Genet. 9, 436–447 (2016). This paper is the largest EWAS on cigarette smoking in adults with almost 16,000 participants and identifies differential DNAm between current and never smokers at 2,623 CpG sites.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Zeilinger, S. et al. Tobacco smoking leads to extensive genome-wide changes in DNA methylation. PLoS ONE 8, e63812 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Guida, F. et al. Dynamics of smoking-induced genome-wide methylation changes with time since smoking cessation. Hum. Mol. Genet. 24, 2349–2359 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Maas, S. C. E. et al. Validated inference of smoking habits from blood with a finite DNA methylation marker set. Eur. J. Epidemiol. 34, 1055–1074 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. McCartney, D. L. et al. Epigenetic signatures of starting and stopping smoking. EBioMedicine 37, 214–220 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Corley, J. et al. Epigenetic signatures of smoking associate with cognitive function, brain structure, and mental and physical health outcomes in the Lothian Birth Cohort 1936. Transl. Psychiatry 9, 248 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Su, D. et al. Distinct epigenetic effects of tobacco smoking in whole blood and among leukocyte subtypes. PLoS ONE 11, e0166486 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  56. You, C. et al. A cell-type deconvolution meta-analysis of whole blood EWAS reveals lineage-specific smoking-associated DNA methylation changes. Nat. Commun. 11, 4779 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Benowitz, N. L. et al. Biochemical verification of tobacco use and abstinence: 2019 update. Nicotine Tob. Res. 22, 1086–1097 (2020).

    Article  PubMed  Google Scholar 

  58. Richmond, R. C., Suderman, M., Langdon, R., Relton, C. L., & Davey Smith, G. DNA methylation as a marker for prenatal smoke exposure in adults. Int. J. Epidemiol. 47, 1120–1130 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Wiklund, P. et al. DNA methylation links prenatal smoking exposure to later life health outcomes in offspring. Clin. Epigenetics 11, 97 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  60. Bojesen, S. E., Timpson, N., Relton, C., Davey Smith, G. & Nordestgaard, B. G. AHRR (cg05575921) hypomethylation marks smoking behaviour, morbidity and mortality. Thorax 72, 646–653 (2017). This paper provides a clear example of how DNAm can proxy an established risk factor and out-perform the measurement of that risk factor in predicting morbidity and mortality.

    Article  PubMed  Google Scholar 

  61. Tu, W., Chu, C., Li, S. & Liangpunsakul, S. Development and validation of a composite score for excessive alcohol use screening. J. Investig. Med. 64, 1006–1011 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Joubert, B. R. et al. DNA methylation in newborns and maternal smoking in pregnancy: genome-wide consortium meta-analysis. Am. J. Hum. Genet. 98, 680–696 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Liu, C. et al. A DNA methylation biomarker of alcohol consumption. Mol. Psychiatry 23, 422–433 (2018).

    Article  CAS  PubMed  Google Scholar 

  64. Clarke, T. K. et al. Genome-wide association study of alcohol consumption and genetic overlap with other health-related traits in UK biobank (N = 112117). Mol. Psychiatry 22, 1376–1384 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Taylor, M., Simpkin, A. J., Haycock, P. C., Dudbridge, F. & Zuccolo, L. Exploration of a polygenic risk score for alcohol consumption: a longitudinal analysis from the ALSPAC cohort. PLoS ONE 11, e0167360 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. Philibert, R., Dogan, M., Beach, S. R. H., Mills, J. A. & Long, J. D. AHRR methylation predicts smoking status and smoking intensity in both saliva and blood DNA. Am. J. Med. Genet. B Neuropsychiatr. Genet. 183, 51–60 (2020).

    Article  CAS  PubMed  Google Scholar 

  67. Yousefi, P. D. et al. Validation and characterisation of a DNA methylation alcohol biomarker across the life course. Clin. Epigenetics 11, 163 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Wahl, S. et al. Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature 541, 81–86 (2017). This paper provided an early demonstration of the value of DNAm predictors in relation to disease discrimination, by showing that a DNAm score for BMI is associated with incident type 2 diabetes.

    Article  CAS  PubMed  Google Scholar 

  69. Dick, K. J. et al. DNA methylation and body-mass index: a genome-wide analysis. Lancet 383, 1990–1998 (2014).

    Article  CAS  PubMed  Google Scholar 

  70. Mendelson, M. M. et al. Association of body mass index with DNA methylation and gene expression in blood cells and relations to cardiometabolic disease: a Mendelian randomization approach. PLoS Med. 14, e1002215 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Reed, Z. E., Suderman, M. J., Relton, C. L., Davis, O. S. P. & Hemani, G. The association of DNA methylation with body mass index: distinguishing between predictors and biomarkers. Clin. Epigenetics 12, 50 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Keller, M. et al. DNA methylation signature in blood mirrors successful weight-loss during lifestyle interventions: the CENTRAL trial. Genome Med. 12, 97 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Crocker, K. C. et al. DNA methylation and adiposity phenotypes: an epigenome-wide association study among adults in the Strong Heart Study. Int. J. Obes. 44, 2313–2322 (2020).

    Article  Google Scholar 

  74. Justice, A. E. et al. Methylome-wide association study of central adiposity implicates genes involved in immune and endocrine systems. Epigenomics 12, 1483–1499 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Vehmeijer, F. O. L. et al. DNA methylation and body mass index from birth to adolescence: meta-analyses of epigenome-wide association studies. Genome Med. 12, 105 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Mandaviya, P. R. et al. Association of dietary folate and vitamin B-12 intake with genome-wide DNA methylation in blood: a large-scale epigenome-wide association analysis in 5841 individuals. Am. J. Clin. Nutr. 110, 437–450 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Gensous, N. et al. One-year Mediterranean diet promotes epigenetic rejuvenation with country- and sex-specific effects: a pilot study from the NU-AGE project. GeroScience 42, 687–701 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Ma, J. et al. Whole blood DNA methylation signatures of diet are associated with cardiovascular disease risk factors and all-cause mortality. Circ. Genom. Precis. Med. 13, 324–333 (2020).

    Article  CAS  Google Scholar 

  79. Do, W. L. et al. Epigenome-wide association study of diet quality in the Women’s Health Initiative and TwinsUK cohort. Int. J. Epidemiol. 50, 675–684 (2021).

    Article  PubMed  Google Scholar 

  80. Gomez-Alonso, M del C. et al. DNA methylation and lipid metabolism: an EWAS of 226 metabolic measures. Clin. Epigenetics 13, 7 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Antoun, E. et al. Maternal dysglycaemia, changes in the infant’s epigenome modified with a diet and physical activity intervention in pregnancy: secondary analysis of a randomised control trial. PLoS Med. 17, e1003229 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Irwin, R. E. et al. A randomized controlled trial of folic acid intervention in pregnancy highlights a putative methylation-regulated control element at ZFP57. Clin. Epigenetics 11, 31 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Sharp, G. C. et al. Maternal BMI at the start of pregnancy and offspring epigenome-wide DNA methylation: findings from the pregnancy and childhood epigenetics (PACE) consortium. Hum. Mol. Genet. 26, 4067–4085 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Howe, C. G. et al. Maternal gestational diabetes and newborn DNA methylation: findings from the Pregnancy and Childhood Epigenetics consortium. Diabetes Care 43, dc190524 (2019).

    Google Scholar 

  85. Ouidir, M. et al. Early pregnancy dyslipidemia is associated with placental DNA methylation at loci relevant for cardiometabolic diseases. Epigenomics 12, 921–934 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Agha, G. et al. Adiposity is associated with DNA methylation profile in adipose tissue. Int. J. Epidemiol. 44, 1277–1287 (2015).

    Article  PubMed  Google Scholar 

  87. Huang, Y. T. et al. Epigenome-wide profiling of DNA methylation in paired samples of adipose tissue and blood. Epigenetics 11, 227–236 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  88. Allum, F. et al. Dissecting features of epigenetic variants underlying cardiometabolic risk using full-resolution epigenome profiling in regulatory elements. Nat. Commun. 10, 1209 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  89. Richmond, R. C. et al. DNA methylation and BMI: investigating identified methylation sites at HIF3A in a causal framework. Diabetes 65, 1231–1244 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Sun, D. et al. Body mass index drives changes in DNA methylation: a longitudinal study. Circ. Res. 125, 824–833 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Gudsnuk, K. & Champagne, F. A. Epigenetic influence of stress and the social environment. ILAR J. 53, 279–288 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  92. Cunliffe, V. T. The epigenetic impacts of social stress: how does social adversity become biologically embedded? Epigenomics 8, 1653–1669 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Borghol, N. et al. Associations with early-life socio-economic position in adult DNA methylation. Int. J. Epidemiol. 41, 62–74 (2012).

    Article  PubMed  Google Scholar 

  94. Chen, D., Meng, L., Pei, F., Zheng, Y. & Leng, J. A review of DNA methylation in depression. J. Clin. Neurosci. 43, 39–46 (2017).

    Article  CAS  PubMed  Google Scholar 

  95. Vukojevic, V. et al. Epigenetic modification of the glucocorticoid receptor gene is linked to traumatic memory and post-traumatic stress disorder risk in genocide survivors. J. Neurosci. 34, 10274–10284 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  96. Yehuda, R. et al. Lower methylation of glucocorticoid receptor gene promoter 1F in peripheral blood of veterans with posttraumatic stress disorder. Biol. Psychiatry 77, 356–364 (2015).

    Article  CAS  PubMed  Google Scholar 

  97. Non, A. L. et al. DNA methylation at stress-related genes is associated with exposure to early life institutionalization. Am. J. Phys. Anthropol. 161, 84–93 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  98. McGowan, P. O. et al. Epigenetic regulation of the glucocorticoid receptor in human brain associates with childhood abuse. Nat. Neurosci. 12, 342–348 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Suderman, M. et al. Childhood abuse is associated with methylation of multiple loci in adult DNA. BMC Med. Genomics 7, 13 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  100. Hostinar, C. E., Sullivan, R. M. & Gunnar, M. R. Psychobiological mechanisms underlying the social buffering of the hypothalamic-pituitary-adrenocortical axis: a review of animal models and human studies across development. Psychol. Bull. 140, 256–282 (2014).

    Article  PubMed  Google Scholar 

  101. Swartz, J. R., Hariri, A. R. & Williamson, D. E. An epigenetic mechanism links socioeconomic status to changes in depression-related brain function in high-risk adolescents. Mol. Psychiatry 22, 209–214 (2017).

    Article  CAS  PubMed  Google Scholar 

  102. Clark, S. L. et al. A methylation study of long-term depression risk. Mol. Psychiatry 25, 1334–1343 (2020).

    Article  CAS  PubMed  Google Scholar 

  103. Barbu, M. C. et al. Epigenetic prediction of major depressive disorder. Mol. Psychiatry 26, 5112–5123 (2021).

    Article  CAS  PubMed  Google Scholar 

  104. Clive, M. L. et al. Discovery and replication of a peripheral tissue DNA methylation biosignature to augment a suicide prediction model. Clin. Epigenetics 8, 113 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  105. Yang, X., Gao, L. & Zhang, S. Comparative pan-cancer DNA methylation analysis reveals cancer common and specific patterns. Brief. Bioinform. 18, 761–773 (2017).

    CAS  PubMed  Google Scholar 

  106. Zhang, J. & Huang, K. Pan-cancer analysis of frequent DNA co-methylation patterns reveals consistent epigenetic landscape changes in multiple cancers. BMC Genomics 18, 1045 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  107. Tao, Y. et al. Aging-like spontaneous epigenetic silencing facilitates Wnt activation, stemness, and Braf V600E -induced tumorigenesis. Cancer Cell 35, 315–328.e6 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Chen, Y. et al. MGMT promoter methylation and glioblastoma prognosis: a systematic review and meta-analysis. Arch. Med. Res. 44, 281–290 (2013).

    Article  CAS  PubMed  Google Scholar 

  109. Wick, W. et al. Temozolomide chemotherapy alone versus radiotherapy alone for malignant astrocytoma in the elderly: the NOA-08 randomised, phase 3 trial. Lancet Oncol. 13, 707–715 (2012).

    Article  CAS  PubMed  Google Scholar 

  110. Malmström, A. et al. Temozolomide versus standard 6-week radiotherapy versus hypofractionated radiotherapy in patients older than 60 years with glioblastoma: the Nordic randomised, phase 3 trial. Lancet Oncol. 13, 916–926 (2012).

    Article  PubMed  CAS  Google Scholar 

  111. Loeb, S. et al. Overdiagnosis and overtreatment of prostate cancer. Eur. Urol. 65, 1046–1055 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  112. Jørgensen, K. J. & Gøtzsche, P. C. Overdiagnosis in publicly organised mammography screening programmes: systematic review of incidence trends. BMJ 339, 206–209 (2009).

    Article  Google Scholar 

  113. Hulbert, A. et al. Early detection of lung cancer using DNA promoter hypermethylation in plasma and sputum. Clin. Cancer Res. 23, 1998–2005 (2017).

    Article  CAS  PubMed  Google Scholar 

  114. Li, L. et al. Diagnosis of pulmonary nodules by DNA methylation analysis in bronchoalveolar lavage fluids. Clin. Epigenetics 13, 185 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Dvorská, D. et al. Aberrant methylation status of tumour suppressor genes in ovarian cancer tissue and paired plasma samples. Int. J. Mol. Sci. 20, 4119 (2019).

    Article  PubMed Central  CAS  Google Scholar 

  116. Majumder, S. et al. Novel methylated DNA markers discriminate advanced neoplasia in pancreatic cysts: marker discovery, tissue validation, and cyst fluid testing. Am. J. Gastroenterol. 114, 1539–1549 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  117. Sanchez-Cespedes, M. et al. Gene promoter hypermethylation in tumors and serum of head and neck cancer patients. Cancer Res. 60, 892–895 (2000).

    CAS  PubMed  Google Scholar 

  118. Nakahara, Y., Shintani, S., Mihara, M., Hino, S. & Hamakawa, H. Detection of p16 promoter methylation in the serum of oral cancer patients. Int. J. Oral. Maxillofac. Surg. 35, 362–365 (2006).

    Article  CAS  PubMed  Google Scholar 

  119. Nakayama, H. et al. Molecular detection of p16 promoter methylation in the serum of colorectal cancer patients. Cancer Lett. 188, 115–119 (2002).

    Article  CAS  PubMed  Google Scholar 

  120. Ooki, A. et al. A panel of novel detection and prognostic methylated DNA markers in primary non–small cell lung cancer and serum DNA. Clin. Cancer Res. 23, 7141–7152 (2017).

    Article  CAS  PubMed  Google Scholar 

  121. Guan, Z. et al. Individual and joint performance of DNA methylation profiles, genetic risk score and environmental risk scores for predicting breast cancer risk. Mol. Oncol. 14, 42–53 (2020).

    Article  CAS  PubMed  Google Scholar 

  122. Onwuka, J. U. et al. A panel of DNA methylation signature from peripheral blood may predict colorectal cancer susceptibility. BMC Cancer 20, 692 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Walker, R. M. et al. Epigenome-wide analyses identify DNA methylation signatures of dementia risk. Alzheimer’s Dement. 12, e12078 (2020).

    Google Scholar 

  124. Baglietto, L. et al. DNA methylation changes measured in pre-diagnostic peripheral blood samples are associated with smoking and lung cancer risk. Int. J. Cancer 140, 50–61 (2017).

    Article  CAS  PubMed  Google Scholar 

  125. Zhang, Y. et al. Smoking-associated DNA methylation markers predict lung cancer incidence. Clin. Epigenetics 8, 127 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  126. Wang, L. et al. Methylation markers for small cell lung cancer in peripheral blood leukocyte DNA. J. Thorac. Oncol. 5, 778–785 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  127. Pedersen, K. S. et al. Leukocyte DNA methylation signature differentiates pancreatic cancer patients from healthy controls. PLoS ONE 6, e18223 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Michaud, D. S. et al. Epigenome-wide association study using prediagnostic bloods identifies new genomic regions associated with pancreatic cancer risk. JNCI Cancer Spectr. 4, pkaa041 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  129. Xu, R. H. et al. Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma. Nat. Mater. 16, 1155–1162 (2017).

    Article  CAS  PubMed  Google Scholar 

  130. Shen, S. Y. et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature 563, 579–583 (2018).

    Article  CAS  PubMed  Google Scholar 

  131. Roy, D. & Tiirikainen, M. Diagnostic power of DNA methylation classifiers for early detection of cancer. Trends cancer 6, 78–81 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Liu, M. C. et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann. Oncol. 31, 745–759 (2020).

    Article  CAS  PubMed  Google Scholar 

  133. Nassiri, F. et al. Detection and discrimination of intracranial tumors using plasma cell-free DNA methylomes. Nat. Med. 26, 1044–1047 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Nuzzo, P. V. et al. Detection of renal cell carcinoma using plasma and urine cell-free DNA methylomes. Nat. Med. 26, 1041–1043 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. Guler, G. D. et al. Detection of early stage pancreatic cancer using 5-hydroxymethylcytosine signatures in circulating cell free DNA. Nat. Commun. 11, 5270 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Tse, R. T.-H. et al. Urinary cell-free DNA in bladder cancer detection. Diagnostics 11, 306 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Luo, H. et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci. Transl. Med. 12, eaax7533 (2020).

    Article  CAS  PubMed  Google Scholar 

  138. NHS. NHS to pilot potentially revolutionary blood test that detects more than 50 cancers. https://www.england.nhs.uk/2020/11/nhs-to-pilot-potentially-revolutionary-blood-test/ (2021).

  139. Klein, E. A. et al. Clinical validation of a targeted methylation-based multi-cancer early detection test using an independent validation set. Ann. Oncol. 32, 1167–1177 (2021). This study demonstrates the ability of cell-free DNA polymorphisms and DNAm to discriminate >50 cancer types and tissue of origin.

    Article  CAS  PubMed  Google Scholar 

  140. Richard, M. A. et al. DNA methylation analysis identifies loci for blood pressure regulation. Am. J. Hum. Genet. 101, 888–902 (2017). The largest blood pressure EWAS to date, with information from more than 17,000 participants, which found that a 13 CpG score could explain only between 1% and 2% of the variance of systolic and diastolic blood pressure, respectively.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Huang, Y. et al. Identification, heritability, and relation with gene expression of novel DNA methylation loci for blood pressure. Hypertension 76, 195–205 (2020).

    Article  CAS  PubMed  Google Scholar 

  142. Fernández-Sanlés, A., Sayols-Baixeras, S., Subirana, I., Degano, I. R. & Elosua, R. Association between DNA methylation and coronary heart disease or other atherosclerotic events: a systematic review. Atherosclerosis 263, 325–333 (2017).

    Article  PubMed  CAS  Google Scholar 

  143. Westerman, K. et al. DNA methylation modules associate with incident cardiovascular disease and cumulative risk factor exposure. Clin. Epigenetics 11, 142 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  144. Shen, Y. et al. Epigenome-wide association study indicates hypomethylation of MTRNR2L8 in large-artery atherosclerosis stroke. Stroke 50, 1330–1338 (2019).

    Article  CAS  PubMed  Google Scholar 

  145. Dogan, M. V., Grumbach, I. M., Michaelson, J. J. & Philibert, R. A. Integrated genetic and epigenetic prediction of coronary heart disease in the Framingham Heart Study. PLoS ONE 13, e0190549 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  146. Westerman, K. et al. Epigenomic assessment of cardiovascular disease risk and interactions with traditional risk metrics. J. Am. Heart Assoc. 9, e015299 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  147. Nuotio, M. L. et al. An epigenome-wide association study of metabolic syndrome and its components. Sci. Rep. 10, 20567 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Chambers, J. C. et al. Epigenome-wide association of DNA methylation markers in peripheral blood from Indian Asians and Europeans with incident type 2 diabetes: a nested case–control study. Lancet Diabetes Endocrinol. 3, 526–534 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Cardona, A. et al. Epigenome-wide association study of incident type 2 diabetes in a British population: EPIC-Norfolk study. Diabetes 68, 2315–2326 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Xu, C. et al. Elevated methylation of OPRM1 and OPRL1 genes in Alzheimer’s disease. Mol. Med. Rep. 18, 4297–4302 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  151. Wang, C., Chen, L., Yang, Y., Zhang, M. & Wong, G. Identification of potential blood biomarkers for Parkinson’s disease by gene expression and DNA methylation data integration analysis. Clin. Epigenetics 11, 24 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Osborne, L. et al. Replication of epigenetic postpartum depression biomarkers and variation with hormone levels. Neuropsychopharmacology 41, 1648–1658 (2016).

    Article  CAS  PubMed  Google Scholar 

  153. Guintivano, J., Arad, M., Gould, T. D., Payne, J. L. & Kaminsky, Z. A. Antenatal prediction of postpartum depression with blood DNA methylation biomarkers. Mol. Psychiatry 19, 560–567 (2014).

    Article  CAS  PubMed  Google Scholar 

  154. Boks, M. P. et al. SKA2 methylation is involved in cortisol stress reactivity and predicts the development of post-traumatic stress disorder (PTSD) after military deployment. Neuropsychopharmacology 41, 1350–1356 (2016).

    Article  CAS  PubMed  Google Scholar 

  155. Kaminsky, Z. et al. A multi-tissue analysis identifies HLA complex group 9 gene methylation differences in bipolar disorder. Mol. Psychiatry 17, 728–740 (2012).

    Article  CAS  PubMed  Google Scholar 

  156. Howsmon, D. P., Kruger, U., Melnyk, S., James, S. J. & Hahn, J. Classification and adaptive behavior prediction of children with autism spectrum disorder based upon multivariate data analysis of markers of oxidative stress and DNA methylation. PLoS Comput. Biol. 13, e1005385 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  157. Ju, C. et al. Integrated genome-wide methylation and expression analyses reveal functional predictors of response to antidepressants. Transl. Psychiatry 9, 1–12 (2019).

    Article  CAS  Google Scholar 

  158. Kuhn, M. & Johnson, K. Feature Engineering and Selection: a Practical Approach for Predictive Models (CRC Press, 2019).

  159. Zhang, Y., Florath, I., Saum, K. U. & Brenner, H. Self-reported smoking, serum cotinine, and blood DNA methylation. Environ. Res. 146, 395–403 (2016).

    Article  CAS  PubMed  Google Scholar 

  160. Rhead, B. et al. Rheumatoid arthritis naive T cells share hypermethylation sites with synoviocytes. Arthritis Rheumatol. 69, 550–559 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  161. Ligthart, S. et al. DNA methylation signatures of chronic low-grade inflammation are associated with complex diseases. Genome Biol. 17, 255 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  162. Shi, J. et al. Winner’s curse correction and variable thresholding improve performance of polygenic risk modeling based on genome-wide association study summary-level data. PLoS Genet. 12, e100649 (2016).

    Article  Google Scholar 

  163. Kundu, S. AI in medicine must be explainable. Nat. Med. 27, 1328 (2021).

    Article  CAS  PubMed  Google Scholar 

  164. Dye, C. K. et al. Comparative DNA methylomic analyses reveal potential origins of novel epigenetic biomarkers of insulin resistance in monocytes from virally suppressed HIV-infected adults. Clin. Epigenetics 11, 95 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  165. Shen, F. et al. Identification of CD28 and PTEN as novel prognostic markers for cervical cancer. J. Cell. Physiol. 234, 7004–7011 (2019).

    Article  CAS  PubMed  Google Scholar 

  166. James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning (Springer, 2013). This is a standard introductory text to machine learning modelling with some level of mathematical background required and applied programming tutorials.

  167. Hattab, M. W., Clark, S. L. & van den Oord, E. J. C. G. Overestimation of the classification accuracy of a biomarker for assessing heavy alcohol use. Mol. Psychiatry 23, 2114–2115 (2018). This letter identifies and clearly articulates the issue of data leakage that impacted the approach and inflated the performance statistics of several early DNAm predictors, particularly those developed from large EWAS meta-analyses.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  168. Steyerberg, E. W. et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 21, 128–138 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  169. Cohen, J. Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol. Bull. 70, 213–220 (1968).

    Article  CAS  PubMed  Google Scholar 

  170. Jurman, G., Riccadonna, S. & Furlanello, C. A comparison of MCC and CEN error measures in multi-class prediction. PLoS ONE 7, 41882 (2012).

    Article  CAS  Google Scholar 

  171. Simpkin, A. J., Suderman, M. & Howe, L. D. Epigenetic clocks for gestational age: statistical and study design considerations. Clin. Epigenetics 9, 100 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  172. Mills, M. C. & Rahal, C. A scientometric review of genome-wide association studies. Commun. Biol. 2, 9 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  173. Chen, I. Y. et al. Ethical machine learning in health care. Annu. Rev. Biomed. Data Sci. 4, 123–144 (2021). This review identifies the many different ways that uncritical development of prediction models of health characteristics can entrench and exacerbate disparities for vulnerable populations.

    Article  PubMed  Google Scholar 

  174. Mitchell, M. et al. Model cards for model reporting. In FAT* ‘19: Proceedings of the Conference on Fairness, Accountability, and Transparency 220–229 (ACM, 2018).

  175. Thomas, R. & Uminsky, D. The problem with metrics is a fundamental problem for AI. arXiv, doi:arxiv.org/abs/2002.08512 (2020).

  176. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. Data Mining, Inference, and Prediction, Second Edition (Springer Science & Business Media, 2009). This is a canonical text on theoretical and applied machine learning with detailed introductions to linear modelling, many common supervised and unsupervised learning methods, and design considerations for prediction modelling.

  177. Greener, J. G., Kandathil, S. M., Moffat, L. & Jones, D. T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 23, 40–55 (2022).

    Article  CAS  PubMed  Google Scholar 

  178. Bottner, A. et al. Gender differences of adiponectin levels develop during the progression of puberty and are related to serum androgen levels. J. Clin. Endocrinol. Metab. 89, 4053–4061 (2004).

    Article  PubMed  CAS  Google Scholar 

  179. Riley, R. D. et al. Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat. Med. 38, 1276–1296 (2019). This is an exploration of the key constraints that affect power and sample size in machine learning and prediction settings for binary and time-to-event outcomes.

    Article  PubMed  Google Scholar 

  180. Riley, R. D. et al. Minimum sample size for developing a multivariable prediction model: part I – continuous outcomes. Stat. Med. 38, 1262–1275 (2019). This is an exploration of the key constraints that affect power and sample size in machine learning and prediction settings for continuous outcomes.

    Article  PubMed  Google Scholar 

  181. National Human Genome Research Institute. DNA sequencing costs: data. https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data (2021).

  182. Shafi, A., Mitrea, C., Nguyen, T. & Draghici, S. A survey of the approaches for identifying differential methylation using bisulfite sequencing data. Brief. Bioinform. 19, 737–753 (2018).

    Article  CAS  PubMed  Google Scholar 

  183. Zhang, L. et al. DNA methylation landscape reflects the spatial organization of chromatin in different cells. Biophys. J. 113, 1395–1404 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  184. Lin, N. et al. Genome-wide DNA methylation profiling in human breast tissue by Illumina TruSeq methyl capture EPIC sequencing and infinium methylationEPIC beadchip microarray. Epigenetics 16, 754–769 (2021).

    Article  PubMed  Google Scholar 

  185. Wendt, J., Rosenbaum, H., Richmond, T. A., Jeddeloh, J. A. & Burgess, D. L. Targeted bisulfite sequencing using the SeqCap Epi enrichment system. Methods Mol. Biol. 1708, 383–405 (2018).

    Article  CAS  PubMed  Google Scholar 

  186. Liu, Y. et al. DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation. Genome Biol. 22, 295 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  187. Sakamoto, Y. et al. Long-read whole-genome methylation patterning using enzymatic base conversion and nanopore sequencing. Nucleic Acids Res. 49, e81 (2021). This study highlights the use of long-read sequencing of DNAm levels without bisulfite conversion.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  188. Shi, J. et al. The concurrence of DNA methylation and demethylation is associated with transcription regulation. Nat. Commun. 12, 5285 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  189. Pinu, F. R., Goldansaz, S. A. & Jaine, J. Translational metabolomics: current challenges and future opportunities. Metabolites 9, 108 (2019).

    Article  CAS  PubMed Central  Google Scholar 

  190. Ignjatovic, V. et al. Mass spectrometry-based plasma proteomics: considerations from sample collection to achieving translational data. J. Proteome Res. 18, 4085–4097 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  191. Shah, S. et al. Improving phenotypic prediction by combining genetic and epigenetic associations. Am. J. Hum. Genet. 97, 75–85 (2015). This study demonstrates the additive explanatory power of combining polygenic and DNAm-based complex trait prediction, with greater benefit observed when adding DNAm information for traits with greater environmental components.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  192. Shah, S. et al. Genetic and environmental exposures constrain epigenetic drift over the human life course. Genome Res. 24, 1725–1733 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  193. Trejo Banos, D. et al. Bayesian reassessment of the epigenetic architecture of complex traits. Nat. Commun. 11, 2865 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  194. Zhang, F. et al. OSCA: a tool for omic-data-based complex trait analysis. Genome Biol. 20, 107 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  195. Ebrahim, A. et al. Multi-omic data integration enables discovery of hidden biological regularities. Nat. Commun. 7, 13091 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  196. Argelaguet, R. et al. Multi-Omics Factor Analysis — a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14, e8124 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  197. Woo, H. G. et al. Integrative analysis of genomic and epigenomic regulation of the transcriptome in liver cancer. Nat. Commun. 8, 839 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  198. Zhu, B. et al. Integrating clinical and multiple omics data for prognostic assessment across human cancers. Sci. Rep. 7, 16954 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  199. Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 83 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  200. Gadd, D. A. et al. Epigenetic scores for the circulating proteome as tools for disease prediction. eLife 11, e71802 (2022). This study highlights the potential of DNAm to index endogenous biomarkers and thus enhance prediction of phenotypes or diseases associated with these biomarkers.

    Article  PubMed  PubMed Central  Google Scholar 

  201. Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 13, 1 (2015). This paper details consensus recommendations of best practices for reporting prediction modelling results as developed by an international expert pannel.

    Article  PubMed  PubMed Central  Google Scholar 

  202. Moons, K. G. M., Royston, P., Vergouwe, Y., Grobbee, D. E. & Altman, D. G. Prognosis and prognostic research: what, why, and how? BMJ 338, 1317–1320 (2009).

    Article  Google Scholar 

  203. Weber, L. M. et al. Essential guidelines for computational method benchmarking. Genome Biol. 20, 125 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  204. Shmueli, G. To explain or to predict? Stat. Sci. 25, 289–310 (2010). This paper provides an accessible explanation of the distinctions between explanatory and predictive statistics in terms of aims and methodologies, as well as perspective on why such differences have been persistently confused across fields.

    Article  Google Scholar 

  205. Murray, R. P., Connett, J. E., Lauger, G. G. & Voelker, H. T. Error in smoking measures: effects of intervention on relations of cotinine and carbon monoxide to self-reported smoking. Am. J. Public Health 83, 1251 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  206. Rehm, J. & Spuhler, T. Measurement error in alcohol consumption: the Swiss Health Survey. Eur. J. Clin. Nutr. 47 (Suppl. 2), S25–S30 (1993).

    PubMed  Google Scholar 

  207. Subar, A. F. et al. Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. Am. J. Epidemiol. 158, 1–13 (2003).

    Article  PubMed  Google Scholar 

  208. Adab, P., Pallan, M. & Whincup, P. H. Is BMI the best measure of obesity? BMJ 360, k1274 (2018).

    Article  PubMed  Google Scholar 

  209. Greenland, S., Pearl, J. & Robins, J. M. Causal diagrams for epidemiologic research. Epidemiology 10, 37–48 (1999).

    Article  CAS  PubMed  Google Scholar 

  210. Peters, J., Janzing, D. & Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms (MIT Press, 2017).

  211. Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. The Morgan Kaufmann Series in Representation and Reasoning (Morgan Kaufmann, 1988).

  212. Piccininni, M., Konigorski, S., Rohmann, J. L. & Kurth, T. Directed acyclic graphs and causal thinking in clinical risk prediction modeling. BMC Med. Res. Methodol. 20, 179 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  213. Austin, P. C. & Steyerberg, E. W. The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models. Stat. Med. 38, 4051–4065 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  214. Korologou-Linden, R., Leyden, G. M., Relton, C. L., Richmond, R. C. & Richardson, T. G. Multi-omics analyses of cognitive traits and psychiatric disorders highlights brain-dependent mechanisms. Hum. Mol. Genet. https://doi.org/10.1093/hmg/ddab016 (2021).

    Article  PubMed  Google Scholar 

  215. Tsai, P. C. et al. Smoking induces coordinated DNA methylation and gene expression changes in adipose tissue with consequences for metabolic health. Clin. Epigenetics 10, 126 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  216. Smith, A. K. et al. DNA extracted from saliva for methylation studies of psychiatric traits: evidence tissue specificity and relatedness to brain. Am. J. Med. Genet. B Neuropsychiatr. Genet. 168, 36–44 (2015).

    Article  CAS  Google Scholar 

  217. Braun, P. R. et al. Genome-wide DNA methylation comparison between live human brain and peripheral tissues within individuals. Transl. Psychiatry 9, 47 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  218. Nagy, C. et al. Single-nucleus transcriptomics of the prefrontal cortex in major depressive disorder implicates oligodendrocyte precursor cells and excitatory neurons. Nat. Neurosci. 23, 771–781 (2020).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank G. Hemani for helpful discussions on genetic prediction and K. Tilling for comments on a draft manuscript. The authors’ work is supported by the Medical Research Council Integrative Epidemiology Unit at the University of Bristol (MC_UU_00011/1 & 5) and via the Cancer Research UK programme grant (C18281/A29019). The authors’ work is also supported by the NIHR Biomedical Research Centre at University Hospitals Bristol and Weston NHS Foundation Trust and the University of Bristol. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

Author information

Authors and Affiliations

Authors

Contributions

P.D.Y., M.S., R.L., O.W. researched the literature. P.D.Y., M.S. and C.L.R. contributed substantially to discussions of the content. P.D.Y., M.S., R.L. wrote the article. P.D.Y., M.S., G.D.S and C.L.R. reviewed and/or edited the manuscript before submission.

Corresponding author

Correspondence to Caroline L. Relton.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Genetics thanks Christopher Bell, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Glossary

Genome-wide association studies

(GWAS). Studies that examine the statistical correlation or ‘association’ between a set of genetic polymorphisms large enough to capture most of the variation in the human genome and a given phenotype of interest.

Polygenic risk scores

(PRSs). Weighted sums of risks for a phenotype conferred by genetic polymorphisms within an individual where the weights used are coefficients from the relevant genome-wide association studies (GWAS). GWAS loci are typically selected for inclusion in the score by applying a P value threshold, commonly that of genome-wide significance (P < 5 × 10–8).

Broad-sense heritability

The proportion of phenotype or trait variance attributable to genetic factors.

DNA methylation

(DNAm). An epigenetic modification whereby a methyl group (CH3) is covalently attached to a DNA base in a mitotically stable bond. In mammals, DNAm occurs mainly at cytosine residues in CpG sites.

CpG sites

Specific sequences of DNA bases where cytosines are followed by guanines. The ‘p’ indicates the phosphate bond separating the two residues in sequence in the 5′ to 3′ direction.

Epigenome-wide association studies

(EWAS). Studies that examine the association between a large number of epigenetic variables and a phenotype or exposure of interest. As most have been performed using DNA methylation levels, we treat EWAS and methylome-wide association studies as synonyms.

DNAm-based predictors

Any statistical models (for example, linear model) of observed data employed to predict values of an outcome (for example, exposure, phenotype or disease) in which many or all of the of the input variables are levels of DNA methylation (DNAm) measured at CpG sites.

Machine learning

Algorithms and statistical models that improve their performance from experience or by optimization through training on earlier data collection.

Epigenetic clocks

Estimators of biological age or other ageing phenotypes that use levels of DNA methylation or other epigenetic measurements as inputs.

Penalized regression

Linear regression modelling methods that apply some numerical penalty on the total size of all input variable coefficient values. Examples include lasso, ridge and elastic net regression.

Linear model

A statistical description of the relationship between one or many input variables X and an observed level of an output Y, where each XY association is summarized by the slope or coefficient of the line plotted between them.

Biological age

The hypothesis that the phenotypical age of a DNA source (for example, cell, tissue or organ) may be greater (that is, accelerated) or less (that is, decelerated) than chronological age at any given point in time.

Mendelian randomization

An analytical method that uses genetic variants as instrumental variables to evaluate putative causal relationships between modifiable risk factors and disease outcomes.

Cell-free DNA

(cfDNA). Non-nucleated DNA found circulating in blood plasma. Sources can include lysed cells from any number of tissues, including tumour cells, which are commonly of greatest interest.

Winner’s curse

The phenomenon that strength of association is commonly overestimated in initial discovery samples and often experiences a regression to the mean in subsequent validation.

Linkage disequilibrium

(LD). Greater than chance co-occurrence or association of alleles at various loci due to nonrandom assortment.

Feature engineering

The process of transforming or combining possible inputs (for example, by taking their principal components or rescaling their values) to make novel super-features that better explain or predict an outcome.

Out-of-sample prediction error

The discrepancy between estimates of an outcome \(\hat{{Y}}\) generated by a predictive modelling function f and values of Y observed in a sample of data that was not available to f during model training.

In-sample prediction error

The discrepancy between estimated values of an outcome \(\hat{{Y}}\) generated by a modelling function f and values of Y observed in a sample of data that was available to f during model training.

Resampling

Splitting, partitioning or sampling available data to generate subsamples in which model predictions can be tested and used to estimate distributions of out-of-sample errors.

Accuracy

The percentage of times all levels of a classifier agree with observed values of those levels.

Confusion matrix

A frequency table of agreement and disagreement between observed and predicted values of an outcome variable. It is used to compute many classification metrics, including, among others, accuracy, sensitivity and specificity.

Cohen’s kappa

A confusion matrix metric ranging from –1 (total disagreement between observed and predicted classes) to 1 (total agreement), where class imbalances are corrected by normalizing to the expected error rate.

Matthews correlation coefficient

A numerical summary of agreement in a confusion matrix, ranging from –1 (total disagreement) to 1 (total agreement), that seeks to correct for class imbalances using a method similar to that of a χ2 statistic.

Calibration

The extent to which predicted outcome risk matches observed outcome proportions.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yousefi, P.D., Suderman, M., Langdon, R. et al. DNA methylation-based predictors of health: applications and statistical considerations. Nat Rev Genet 23, 369–383 (2022). https://doi.org/10.1038/s41576-022-00465-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41576-022-00465-w

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing