Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Ultrasensitive detection of circulating tumour DNA via deep methylation sequencing aided by machine learning

An Author Correction to this article was published on 30 September 2021

This article has been updated

Abstract

The low abundance of circulating tumour DNA (ctDNA) in plasma samples makes the analysis of ctDNA biomarkers for the detection or monitoring of early-stage cancers challenging. Here we show that deep methylation sequencing aided by a machine-learning classifier of methylation patterns enables the detection of tumour-derived signals at dilution factors as low as 1 in 10,000. For a total of 308 patients with surgery-resectable lung cancer and 261 age- and sex-matched non-cancer control individuals recruited from two hospitals, the assay detected 52–81% of the patients at disease stages IA to III with a specificity of 96% (95% confidence interval (CI) 93–98%). In a subgroup of 115 individuals, the assay identified, at 100% specificity (95% CI 91–100%), nearly twice as many patients with cancer as those identified by ultradeep mutation sequencing analysis. The low amounts of ctDNA permitted by machine-learning-aided deep methylation sequencing could provide advantages in cancer screening and the assessment of treatment efficacy.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of ELSA-seq for deep methylation sequencing of low-ctDNA samples.
Fig. 2: WGBS library construction by ELSA-seq.
Fig. 3: Methylation block definition and pattern recognition.
Fig. 4: Analytical validation of ELSA-seq.
Fig. 5: Classification of plasma samples in two independent cohorts.
Fig. 6: Proof-of-concept study and marker selection for LC.
Fig. 7: A parallel comparison of ELSA-seq, HS-UMI and ddPCR.

Similar content being viewed by others

Data availability

The main data supporting the results in this study are available within the paper and its Supplementary Information. The microarray data used for identification of differentially methylated sites can be downloaded from the TCGA database at https://gdac.broadinstitute.org/runs/analyses__2016_01_28/data and from the GEO database under the accession code GSE40279. Illumina EPIC TruSeq Methyl data are available at https://basespace.illumina.com/projects/31997005. The raw sequencing data (.fastq files) generated are available from the NCBI Sequence Read Archive (SRA) repository, under the accession code PRJNA534206. The analysed datasets generated during the study are too large to be publicly shared, but they are available for research purposes from the corresponding authors on reasonable request. Any data and materials that can be shared will be released subject to a data-transfer agreement.

Code availability

Codes and scripts developed for this study are available at http://github.com/bnr-ed/mworkflow.

Change history

References

  1. Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).

    Article  PubMed  Google Scholar 

  2. Schwarzenbach, H., Hoon, D. S. & Pantel, K. Cell-free nucleic acids as biomarkers in cancer patients. Nat. Rev. Cancer 11, 426–437 (2011).

    Article  CAS  PubMed  Google Scholar 

  3. Heitzer, E., Perakis, S., Geigl, J. B. & Speicher, M. R. The potential of liquid biopsies for the early detection of cancer. NPJ Precis. Oncol. 1, 36 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Sun, K. et al. Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments. Proc. Natl Acad. Sci. USA 112, E5503–E5512 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 6, 224ra24 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Newman, A. M. et al. Integrated digital error suppression for improved detection of circulating tumor DNA. Nat. Biotechnol. 34, 547–555 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Abbosh, C. et al. Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution. Nature 545, 446–451 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. McDonald, B. R. et al. Personalized circulating tumor DNA analysis to detect residual disease after neoadjuvant therapy in breast cancer. Sci. Transl. Med. 11, eaax7392 (2019).

  9. Libertini, E. et al. Information recovery from low coverage whole-genome bisulfite sequencing. Nat. Commun. 7, 11306 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Wreczycka, K. et al. Strategies for analyzing bisulfite sequencing data. J. Biotechnol. 261, 105–115 (2017).

    Article  CAS  PubMed  Google Scholar 

  11. Baylin, S. B. & Jones, P. A. A decade of exploring the cancer epigenome—biological and translational implications. Nat. Rev. Cancer 11, 726–734 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Dawson, M. A. The cancer epigenome: Concepts, challenges, and therapeutic opportunities. Science 355, 1147–1152 (2017).

    Article  CAS  PubMed  Google Scholar 

  13. Belinsky, S. A. et al. Aberrant methylation of p16(INK4a) is an early event in lung cancer and a potential biomarker for early diagnosis. Proc. Natl Acad. Sci. USA 95, 11891–11896 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Issa, J. P. CpG island methylator phenotype in cancer. Nat. Rev. Cancer 4, 988–993 (2004).

    Article  CAS  PubMed  Google Scholar 

  15. Guo, S. et al. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat. Genet. 49, 635–642 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Hoadley, K. A. et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell 173, 291–304.e6 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Meissner, A. et al. Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res. 33, 5868–5877 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Li, W. et al. 5-Hydroxymethylcytosine signatures in circulating cell-free DNA as diagnostic biomarkers for human cancers. Cell Res. 27, 1243–1257 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Shen, S. Y. et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature 563, 579–583 (2018).

    Article  CAS  PubMed  Google Scholar 

  20. Wu, H., Wu, X., Shen, L. & Zhang, Y. Single-base resolution analysis of active DNA demethylation using methylase-assisted bisulfite sequencing. Nat. Biotechnol. 32, 1231–1240 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Tanaka, K. & Okamoto, A. Degradation of DNA by bisulfite treatment. Bioorg. Med. Chem. Lett. 17, 1912–1915 (2007).

    Article  CAS  PubMed  Google Scholar 

  22. Bock, C. Analysing and interpreting DNA methylation data. Nat. Rev. Genet. 13, 705–719 (2012).

    Article  CAS  PubMed  Google Scholar 

  23. Laird, P. W. Principles and challenges of genomewide DNA methylation analysis. Nat. Rev. Genet. 11, 191–203 (2010).

    Article  CAS  PubMed  Google Scholar 

  24. Olova, N. et al. Comparison of whole-genome bisulfite sequencing library preparation strategies identifies sources of biases affecting DNA methylation data. Genome Biol. 19, 33 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Li, J. B. et al. Multiplex padlock targeted sequencing reveals human hypermutable CpG variations. Genome Res. 19, 1606–1615 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Peng, X. et al. TELP, a sensitive and versatile library construction method for next-generation sequencing. Nucleic Acids Res. 43, e35 (2015).

    Article  PubMed  CAS  Google Scholar 

  27. Raine, A., Manlig, E., Wahlberg, P., Syvänen, A. C. & Nordlund, J. SPlinted Ligation Adapter Tagging (SPLAT), a novel library preparation method for whole genome bisulphite sequencing. Nucleic Acids Res. 45, e36 (2017).

    Article  PubMed  CAS  Google Scholar 

  28. Wu, J., Dai, W., Wu, L. & Wang, J. SALP, a new single-stranded DNA library preparation method especially useful for the high-throughput characterization of chromatin openness states. BMC Genom. 19, 143 (2018).

    Article  CAS  Google Scholar 

  29. Hoeijmakers, W. A., Bártfai, R., Françoijs, K. J. & Stunnenberg, H. G. Linear amplification for deep sequencing. Nat. Protoc. 6, 1026–1036 (2011).

    Article  CAS  PubMed  Google Scholar 

  30. Genereux, D. P., Johnson, W. C., Burden, A. F., Stoger, R. & Laird, C. D. Errors in the bisulfite conversion of DNA: modulating inappropriate- and failed-conversion frequencies. Nucleic Acids Res. 36, e150 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Schirmer, M., D’Amore, R., Ijaz, U. Z., Hall, N. & Quince, C. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinf. 17, 125 (2016).

    Article  CAS  Google Scholar 

  32. Vaisvila, R. et al. EM-seq: Detection of DNA methylation at single base resolution from picograms of DNA. Preprint at bioRxiv https://doi.org/10.1101/2019.12.20.884692 (2019).

  33. Haque, I. S. & Elemento, O. Challenges in using ctDNA to achieve early detection of cancer. Preprint at bioRxiv https://doi.org/10.1101/237578 (2017).

  34. Deveson, I. W. et al. Evaluating the analytical validity of circulating tumor DNA sequencing assays for precision oncology. Nat. Biotechnol. https://doi.org/10.1038/s41587-021-00857-z (2021).

  35. Jacobs, B. K., Goetghebeur, E. & Clement, L. Impact of variance components on reliability of absolute quantification using digital PCR. BMC Bioinf. 15, 283 (2014).

    Article  Google Scholar 

  36. Cohen, J. D. et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science 359, 926–930 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Phallen, J. et al. Direct detection of early-stage cancers using circulating tumor DNA. Sci. Transl. Med. 9, eaan2415 (2017).

  38. Hajian-Tilaki, K. Sample size estimation in diagnositic test studies of biomedical informatics. J. Biomed. Inf. 48, 193–204 (2014).

    Article  Google Scholar 

  39. Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).

    Google Scholar 

  40. Mazzone, P. J. et al. Evaluating molecular biomarkers for the early detection of lung cancer: when is a biomarker ready for clinical use? An official American Thoracic Society policy statement. Am. J. Respir. Crit. Care Med. 196, e15–e29 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Hori, S. S. & Gambhir, S. S. Mathematical model identifies blood biomarker-based early cancer detection strategies and limitations. Sci. Transl. Med. 3, 109ra116 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Swanton, C. et al. Prevalence of clonal hematopoiesis of indeterminate potential (CHIP) measured by an ultra-sensitive sequencing assay: exploratory analysis of the Circulating Cancer Genome Atlas (CCGA) study. J. Clin. Oncol. 36, 12003–12003 (2018).

    Article  Google Scholar 

  43. Li, W. et al. CancerDetector: ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic Acids Res. 46, e89 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Snyder, M. W., Kircher, M., Hill, A. J., Daza, R. M. & Shendure, J. Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell 164, 57–68 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Ebbert, M. T. et al. Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches. BMC Bioinf. 17, 239 (2016).

    Article  Google Scholar 

  46. Zhang, T. H., Wu, N. C. & Sun, R. A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing. BMC Genom. 17, 108 (2016).

    Article  CAS  Google Scholar 

  47. Costello, M. et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 41, e67 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Cristiano, S. et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature 570, 385–389 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Liu, C. et al. Low-cost thermophoretic profiling of extracellular-vesicle surface proteins for the early detection and classification of cancers. Nat. Biomed. Eng. 3, 183–193 (2019).

    Article  CAS  PubMed  Google Scholar 

  50. Lennon, A. M. et al. Feasibility of blood testing combined with PET-CT to screen for cancer and guide intervention. Science 369, eabb9601 (2020).

  51. Xu, R. H. et al. Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma. Nat. Mater. 16, 1155–1161 (2017).

    Article  CAS  PubMed  Google Scholar 

  52. Jurmeister, P. et al. Machine learning analysis of DNA methylation profiles distinguishes primary lung squamous cell carcinomas from head and neck metastases. Sci. Transl. Med. 11, eaaw8513 (2019).

    Article  CAS  PubMed  Google Scholar 

  53. Liu, M. C. et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann. Oncol. 31, 745–759 (2020).

    Article  CAS  PubMed  Google Scholar 

  54. Nassiri, F. et al. Detection and discrimination of intracranial tumors using plasma cell-free DNA methylomes. Nat. Med. 26, 1044–1047 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2019. CA Cancer J. Clin. 69, 7–34 (2019).

    Article  PubMed  Google Scholar 

  56. Cancer Stat Facts (Surveillance, Epidemiology and End Results Program (SEER)); https://seer.cancer.gov/statfacts/. Accessed 2020.

  57. Ritchie, M. W. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47–e47 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Faust, G. G. & Hall, I. M. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Li, Y. S. et al. Unique genetic profiles from cerebrospinal fluid cell-free DNA in leptomeningeal metastases of EGFR-mutant non-small-cell lung cancer: a new medium of liquid biopsy. Ann. Oncol. 29, 945–952 (2018).

    Article  CAS  PubMed  Google Scholar 

  60. Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank K. Kemphues (Cornell University) for critical review of the manuscript. We thank Z. Liang, H. Wu, Z. Jin, F. Tan, S. Chuai, W. Deng, X. Mao, Y. Ma, L. Yang, J. Ye and F. Duan for their assistance with this study. This work was supported, in part, by Beijing Natural Science Foundation (grant number 7182132), the Major Projects of the Beijing Municipal Science and Technology Commission (grant number Z171100002017013), the Capital Special Project for Featured Clinical Application (grant number Z151100004015157), the Peking Union Medical College Hospital Youth Fund (grant numbers PUMCH-2016-2.25, HI626500), and the Peking Union Medical College Special Youth Teacher Project (grant numbers 2014zlgc0717; 2014zlgc0135). We acknowledge research funding, not attached to this study or to any research project or collaboration, from Burning Rock Biotech to the University of California at Berkeley.

Author information

Authors and Affiliations

Authors

Contributions

N.L. designed, supervised the clinical study, and provided funding. B.L. designed, supervised the technical study, and wrote the paper. Z.J., P.W., Y. Wang, Y. Wu, Z.C., L.C., Z.B., Hongsheng Liu, L.L., C.H., Y.Q. and Y.C. conducted the clinical study including participant recruitment, sample preparation, clinical information collection and interpretation. C. Wang, T.Z., F.Q., J.S., J. Xu, F.X., H.C., S.F., X.Y, H.H.-Z., J. Xiang and Hao Liu performed the technical development including experiment conduction, computational framework construction and data analysis. C. Wu and X.G. optimized the machine-learning algorithm. H.Z. and S.L designed the clinical study and provided funding. Z.Z. conceived the idea and oversaw the overall direction. All authors discussed the results and contributed to the final manuscript.

Corresponding authors

Correspondence to Shanqing Li, Heng Zhao or Zhihong Zhang.

Ethics declarations

Competing interests

T.Z., B.L. and Z.Z. are inventors on a pending patent application held by Burning Rock Biotech related to target deep methylation sequencing (WO2019192489A1, filed in the United States, Canada, Europe, Japan, Sigapore, Australia and Brazil). C. Wang., B.L. and Z.Z. are on a patent application to be submitted by Burning Rock Biotech that covers other aspects of ELSA-seq described in this article.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods and Figs. 1–10.

Reporting Summary

Supplementary Tables 1–10

Oligonucleotide sequences used in ELSA-seq, methylation-pattern counts, hotspot mutations, clinical metadata and other datasets.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, N., Li, B., Jia, Z. et al. Ultrasensitive detection of circulating tumour DNA via deep methylation sequencing aided by machine learning. Nat Biomed Eng 5, 586–599 (2021). https://doi.org/10.1038/s41551-021-00746-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41551-021-00746-5

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer