Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Pervasive chromosomal instability and karyotype order in tumour evolution

Abstract

Chromosomal instability in cancer consists of dynamic changes to the number and structure of chromosomes1,2. The resulting diversity in somatic copy number alterations (SCNAs) may provide the variation necessary for tumour evolution1,3,4. Here we use multi-sample phasing and SCNA analysis of 1,421 samples from 394 tumours across 22 tumour types to show that continuous chromosomal instability results in pervasive SCNA heterogeneity. Parallel evolutionary events, which cause disruption in the same genes (such as BCL9MCL1, ARNT (also known as HIF1B), TERT and MYC) within separate subclones, were present in 37% of tumours. Most recurrent losses probably occurred before whole-genome doubling, that was found as a clonal event in 49% of tumours. However, loss of heterozygosity at the human leukocyte antigen (HLA) locus and loss of chromosome 8p to a single haploid copy recurred at substantial subclonal frequencies, even in tumours with whole-genome doubling, indicating ongoing karyotype remodelling. Focal amplifications that affected chromosomes 1q21 (which encompasses BCL9, MCL1 and ARNT), 5p15.33 (TERT), 11q13.3 (CCND1), 19q12 (CCNE1) and 8q24.1 (MYC) were frequently subclonal yet appeared to be clonal within single samples. Analysis of an independent series of 1,024 metastatic samples revealed that 13 focal SCNAs were enriched in metastatic samples, including gains in chromosome 8q24.1 (encompassing MYC) in clear cell renal cell carcinoma and chromosome 11q13.3 (encompassing CCND1) in HER2+ breast cancer. Chromosomal instability may enable the continuous selection of SCNAs, which are established as ordered events that often occur in parallel, throughout tumour evolution.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of somatic copy number heterogeneity across tumour types.
Fig. 2: Selection shapes the SCNA landscape.
Fig. 3: Timing, recurrence and parallel evolution of subclonal SCNAs.
Fig. 4: Analysis of consensus peak regions in metastatic LUAD, ER+ and HER2+ breast cancers, and KIRC.

Similar content being viewed by others

Data availability

TRACERx sequencing datasets used in this paper are described in previous studies7,39. Details of all other datasets obtained from third parties used in this study can be found in Supplementary Table 1. Clinical trial information (if applicable) is also available within the associated publications described in Supplementary Table 1.

Code availability

All code used for analyses was written in R version 3.6.1 and is available at: https://bitbucket.org/schwarzlab/refphase/. The Markov-chain modelling code and associated data can be found here: https://math.dartmouth.edu/~sergi/mathbio.php.

References

  1. Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).

    CAS  Google Scholar 

  2. Bolhaqueiro, A. C. F. et al. Ongoing chromosomal instability and karyotype evolution in human colorectal cancer organoids. Nat. Genet. 51, 824–834 (2019).

    CAS  Google Scholar 

  3. Davoli, T. et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948–962 (2013).

    CAS  Google Scholar 

  4. Turajlic, S. et al. Deterministic evolutionary trajectories influence primary tumor growth: TRACERx Renal. Cell 173, 595–610 (2018).

    CAS  Google Scholar 

  5. McGranahan, N. et al. Cancer chromosomal instability: therapeutic and diagnostic challenges. ‘Exploring aneuploidy: the significance of chromosomal imbalance’ review series. EMBO Rep. 13, 528–538 (2012).

    CAS  Google Scholar 

  6. Schwarz, R. F. et al. Spatial and temporal heterogeneity in high-grade serous ovarian cancer: a phylogenetic analysis. PLoS Med. 12, e1001789 (2015).

    Google Scholar 

  7. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

    CAS  Google Scholar 

  8. Hieronymus, H. et al. Tumor copy number alteration burden is a pan-cancer prognostic factor associated with recurrence and death. eLife 7, e37294 (2018).

    Google Scholar 

  9. Carter, S. et al. A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers. Nat. Genet. 38, 1043–1048 (2006).

    CAS  Google Scholar 

  10. Schwarz, R. F. et al. Phylogenetic quantification of intra-tumour heterogeneity, PLOS Comput. Biol. 10, e1003535 (2014). 

    Google Scholar 

  11. von der Thüsen, J. H. et al. Prognostic significance of predominant histologic pattern and nuclear grade in resected adenocarcinoma of the lung: potential parameters for a grading system. J. Thorac. Oncol. 8, 37–44 (2013).

    Google Scholar 

  12. Kadota, K. et al. Comprehensive pathological analyses in lung squamous cell carcinoma: single cell invasion, nuclear diameter, and tumor budding are independent prognostic factors for worse outcomes. J. Thorac. Oncol. 9, 1126–1139 (2014).

    Google Scholar 

  13. Laughney, A. M., Elizalde, S., Genovese, G. & Bakhoum, S. F. Dynamics of tumor heterogeneity derived from clonal karyotypic evolution. Cell Rep. 12, 809–820 (2015).

    CAS  Google Scholar 

  14. Elizalde, S., Laughney, A. M. & Bakhoum, S. F. A Markov chain for numerical chromosomal instability in clonally expanding populations. PLOS Comput. Biol. 14, e1006447 (2018).

    ADS  Google Scholar 

  15. Sottoriva, A. et al. A Big Bang model of human colorectal tumor growth. Nat. Genet. 47, 209–216 (2015).

    CAS  Google Scholar 

  16. Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238–244 (2016).

    CAS  Google Scholar 

  17. López, S. et al. Interplay between whole-genome doubling and the accumulation of deleterious alterations in cancer evolution. Nat. Genet. 52, 283–293 (2020).

    Google Scholar 

  18. Fujiwara, T. et al. Cytokinesis failure generating tetraploids promotes tumorigenesis in p53-null cells. Nature 437, 1043–1047 (2005).

    ADS  CAS  Google Scholar 

  19. Bielski, C. M. et al. Genome doubling shapes the evolution and prognosis of advanced cancers. Nat. Genet. 50, 1189–1195 (2018).

    CAS  Google Scholar 

  20. McGranahan, N. et al. Allele-specific HLA loss and immune escape in lung cancer evolution. Cell 171, 1259–1271 (2017).

    CAS  Google Scholar 

  21. Snyder, A. et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N. Engl. J. Med. 371, 2189–2199 (2014).

    Google Scholar 

  22. Kim, M. et al. Comparative oncogenomics identifies NEDD9 as a melanoma metastasis gene. Cell 125, 1269–1281 (2006).

    CAS  Google Scholar 

  23. Cai, Y. et al. Loss of chromosome 8p governs tumor progression and drug response by altering lipid metabolism. Cancer Cell 29, 751–766 (2016).

    CAS  Google Scholar 

  24. Bakhoum, S. F. et al. Chromosomal instability drives metastasis through a cytosolic DNA response. Nature 553, 467–472 (2018).

    ADS  CAS  Google Scholar 

  25. Lackner, C. et al. Convergent evolution of copy number alterations in multi-centric hepatocellular carcinoma. Sci. Rep. 9, 4611 (2019).

    ADS  Google Scholar 

  26. Jakubek, Y. A. et al. Large-scale analysis of acquired chromosomal alterations in non-tumor samples from patients with cancer. Nat. Biotechnol. 38, 90–96 (2020).

    CAS  Google Scholar 

  27. Zaccaria, S. & Raphael, B. J. Characterizing the allele- and haplotype-specific copy number landscape of cancer genomes at single-cell resolution with CHISEL. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0661-6 (2020).

  28. Shih, D. J. H. et al. Genomic characterization of human brain metastases identifies drivers of metastatic lung adenocarcinoma. Nat. Genet. 52, 371–377 (2020).

    CAS  Google Scholar 

  29. Turner, K. M. et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature 543, 122–125 (2017).

    ADS  CAS  Google Scholar 

  30. Worrall, J. T. et al. Non-random mis-segregation of human chromosomes. Cell Rep. 23, 3366–3380 (2018).

    CAS  Google Scholar 

  31. Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).

    ADS  Google Scholar 

  32. Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    Google Scholar 

  33. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    CAS  Google Scholar 

  34. Gundem, G. et al. The evolutionary history of lethal metastatic prostate cancer. Nature 520, 353–357 (2015).

    CAS  Google Scholar 

  35. Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).

    CAS  Google Scholar 

  36. Yates, L. R. et al. Genomic evolution of breast cancer metastasis and relapse. Cancer Cell 32, 169–184 (2017).

    CAS  Google Scholar 

  37. Mitchell, T. J. et al. Timing the landmark events in the evolution of clear cell renal cell cancer: TRACERx Renal. Cell 173, 611–623 (2018).

    CAS  Google Scholar 

  38. Martinez, P. et al. Parallel evolution of tumour subclones mimics diversity between tumours. J. Pathol. 230, 356–364 (2013).

    CAS  Google Scholar 

  39. Rosenthal, R. et al. Neoantigen-directed immune escape in lung cancer evolution. Nature 567, 479–485 (2019).

    ADS  CAS  Google Scholar 

  40. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Google Scholar 

  41. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

    CAS  Google Scholar 

  42. Castel, S. E., Mohammadi, P., Chung, W. K., Shen, Y. & Lappalainen, T. Rare variant phasing and haplotypic expression from RNA sequencing with phASER. Nat. Commun. 7, 12817 (2016).

    ADS  Google Scholar 

  43. Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).

    CAS  Google Scholar 

  44. Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018).

    CAS  Google Scholar 

  45. Forbes, S. A. et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–D811 (2015).

    CAS  Google Scholar 

  46. Hartigan, J. A. & Hartigan, P. M. The dip test of unimodality. Ann. Stat. 13, 70–84 (1985).

    MathSciNet  MATH  Google Scholar 

  47. Maechler, M. diptest: Hartigan’s dip test statistic for unimodality—corrected. R package version 0.75-7 https://cran.r-project.org/package=diptest (2015).

  48. Wolff, A. C. et al. Recommendations for human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists clinical practice guideline update. J. Clin. Oncol. 31, 3997–4013 (2013).

    Google Scholar 

  49. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).

    Google Scholar 

  50. Fungtammasan, A., Walsh, E., Chiaromonte, F., Eckert, K. A. & Makova, K. D. A genome-wide analysis of common fragile sites: what features determine chromosomal instability in the human genome? Genome Res. 22, 993–1005 (2012).

    CAS  Google Scholar 

  51. Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

    CAS  Google Scholar 

  52. Cheng, J. et al. Single-cell copy number variation detection. Genome Biol. 12, R80 (2011).

    CAS  Google Scholar 

  53. Whitfield, M. L. et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell 13, 1977–2000 (2002).

    CAS  Google Scholar 

  54. Abbosh, C. et al. Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution. Nature 545, 446–451 (2017).

    ADS  CAS  Google Scholar 

  55. Priestley, P. et al. Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216 (2019).

    ADS  CAS  Google Scholar 

  56. Moulos, P. & Hatzis, P. Systematic integration of RNA-seq statistical algorithms for accurate detection of differential gene expression patterns. Nucleic Acids Res. 43, e25 (2015).

    Google Scholar 

Download references

Acknowledgements

T.B.K.W. was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001169), the UK Medical Research Council (FC001169) and the Wellcome Trust (FC001169) as well as the Marie Curie ITN Project PLOIDYNET (FP7-PEOPLE-2013, 607722), Breast Cancer Research Foundation (BCRF), Royal Society Research Professorships Enhancement Award (RP/EA/180007) and the Foulkes Foundation. E.L.L. receives funding from NovoNordisk Foundation (ID 16584). N.J.B. is a fellow of the Lundbeck Foundation and acknowledges funding from the Aarhus University Research Foundation. E.G. is funded by the European Research Council, FP7-THESEUS-617844 and PROTEUS-835297. J.D. is a postdoctoral fellow of the Research Foundation–Flanders (FWO) and the European Union’s Horizon 2020 research and innovation program (Marie Skłodowska-Curie grant agreement no. 703594-DECODE). R.R. is supported by Royal Society Research Professorships Enhancement Award (RP/EA/180007). K.L. is supported by a UK Medical Research Council Skills Development Fellowship Award (grant number MR/P014712/1). L.Y. was funded by a Wellcome Trust Clinical Career Development Fellowship 214584/Z/18/Z and CRUK Early Detection Pump Prime Award. B.C.B. is supported by an NCI Outstanding Investigatory Award (1R35CA220481). G.B.J. is supported by the Swedish Cancer Society, Swedish Research Council and the Berta Kamprad Foundation. S.L. is supported by the National Breast Cancer Foundation of Australia Endowed Chair and the Breast Cancer Research Foundation, New York. N.M.L. and G.D.C. were supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC010110), the UK Medical Research Council (FC010110) and the Wellcome Trust (FC010110). S.T. is funded by Cancer Research UK (grant number C50947/A18176), the National Institute for Health Research (NIHR) Biomedical Research Centre at The Royal Marsden Hospital and Institute of Cancer Research (grant number A109), the Kidney and Melanoma Cancer Fund of The Royal Marsden Cancer Charity, and The Rosetrees Trust (grant number A2204). M.J.-H. has received funding from Cancer Research UK, National Institute for Health Research, Rosetrees Trust, UKI NETs and NIHR University College London Hospitals Biomedical Research Centre. P.V.L. is supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001202), the UK Medical Research Council (FC001202) and the Wellcome Trust (FC001202) and is a Winton Group Leader in recognition of the Winton Charitable Foundation’s support towards the establishment of The Francis Crick Institute. S.F.B. is supported by the Office of the Director, the National Institutes of Health under award number DP5OD026395 High-Risk High-Reward Program, the Department of Defense Breast Cancer Research Breakthrough Award W81XWH-16-1-0315 (project: BC151244), the Burroughs Wellcome Fund Career Award for Medical Scientists, the Parker Institute for Immunotherapy at MSKCC, the Josie Robertson Foundation and MSKCC core grant P30-CA008748. R.F.S. and M.P. thank the Helmholtz Association (Germany) for support. N.M. is a Sir Henry Dale Fellow, jointly funded by the Wellcome Trust and the Royal Society (Grant Number 211179/Z/18/Z) and also receives funding from Cancer Research UK, Rosetrees and the NIHR BRC at University College London Hospitals and the CRUK University College London Experimental Cancer Medicine Centre. C.S. is Royal Society Napier Research Professor. His work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001169), the UK Medical Research Council (FC001169), and the Wellcome Trust (FC001169). C.S. is funded by Cancer Research UK (TRACERx, PEACE and CRUK Cancer Immunotherapy Catalyst Network), Cancer Research UK Lung Cancer Centre of Excellence, the Rosetrees Trust, Butterfield and Stoneygate Trusts, NovoNordisk Foundation (ID16584), Royal Society Research Professorships Enhancement Award (RP/EA/180007), the NIHR BRC at University College London Hospitals, the CRUK-UCL Centre, Experimental Cancer Medicine Centre and the Breast Cancer Research Foundation (BCRF). This research is supported by a Stand Up To Cancer-LUNGevity-American Lung Association Lung Cancer Interception Dream Team Translational Research Grant (SU2C-AACR-DT23-17). Stand Up To Cancer is a program of the Entertainment Industry Foundation. Research grants are administered by the American Association for Cancer Research, the Scientific Partner of SU2C. C.S. also receives funding from the European Research Council (ERC) under the European Union’s Seventh Framework Programme (FP7/2007-2013) Consolidator Grant (FP7-THESEUS-617844), European Commission ITN (FP7-PloidyNet 607722), an ERC Advanced Grant (PROTEUS) from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (835297) and Chromavision from the European Union’s Horizon 2020 research and innovation programme (665233). The results published here are based in part on data generated by The Cancer Genome Atlas pilot project established by the NCI and the National Human Genome Research Institute. The data were retrieved through database of Genotypes and Phenotypes (dbGaP) authorization (accession number phs000178.v9.p8). Information about TCGA and the constituent investigators and institutions of the TCGA research network can be found at http://cancergenome.nih.gov/. This project was enabled through access to the MRC eMedLab Medical Bioinformatics infrastructure, supported by the Medical Research Council (MR/L016311/1). In particular, we acknowledge the support of the High-Performance Computing at the Francis Crick Institute as well as the UCL Department of Computer Science Cluster and the support team. This publication and the underlying study have been made possible partly on the basis of the data that the Hartwig Medical Foundation and the Center of Personalised Cancer Treatment (CPCT-02, NCT01855477) and DRUP clinical study (NCT02925234) have made available to the project.

Author information

Authors and Affiliations

Authors

Contributions

T.B.K.W. and E.L.L. created the genomics pipeline, designed and conducted bioinformatics analyses and wrote the manuscript. M.P. performed phylogenetic analyses and MRCA reconstructions. S.E. designed and performed the Markov-chain modelling and analysis with S.F.B. providing further analysis and comments. N.J.B., G.A.W., J.D., S.C.D, S.H., K. Haase, M.E., R.R., H.X., K.L., T.P.M. and M.D. provided considerable bioinformatics support. D.A.M. analysed pathology mitotic index and anisonucleosis measurements. E.G., A.R., D.B., S.M.D. and W.T.L. critically assessed the biological soundness of the methods and results. L.A., M.A.B. and L.S. helped to analyse patient clinical characteristics. G.D.C., P.L., I.N., K. Harbst, F.C.-G., L.R.Y., F.C., F.J., C.V., I.P.M.T., P.K.B., R.J.C., B.C.B., L.D., G.B.J., P.S., S.L. and F.A. helped with data access and avenues of enquiry related to individual tumour types. N.S. and V.C.G.T.-H. collated data for the Hartwig Medical Foundation. Z.S., N.M.L., P.J.C. and P.V.L. helped to direct the avenues of bioinformatics analysis and gave feedback on the manuscript. S.T. and M.J.-H. designed study protocols and helped to analyse patient clinical characteristics. R.F.S., N.M. and C.S. jointly designed and supervised the study and helped to write the manuscript.

Corresponding authors

Correspondence to Roland F. Schwarz, Nicholas McGranahan or Charles Swanton.

Ethics declarations

Competing interests

G.A.W. has consulted for and has stock options in Achilles Therapeutics. D.A.M. reports speaker fees from AstraZeneca. M.A.B. has consulted for Achilles Therapeutics. C.V. has received travel expenses from Astellas, Roche and Pfizer, and grant support from Bristol Myers Squibb. R.R. has consulted for and has stock options in Achilles Therapeutics. K.L. reports speaker fees from Roche Tissue Diagnostics. P.K.B. has consulted for Angiochem, Roche-Genentech, Eli Lilly, Tesaro, ElevateBio, Pfizer (Array), and received grant or research support from Merck, Bristol Myers Squibb and Eli Lilly and honoraria from Merck, Roche-Genentech and Eli Lilly. L.D. has sponsored research agreements with C2i-genomics, Natera, AstraZeneca and Ferring, and has an advisory/consulting role at Ferring. P.S. serves an uncompensated consultant for Roche-Genentech. S.L. receives research funding to her institution from Novartis, Bristol Myers Squibb, Merck, Roche-Genentech, Puma Biotechnology, Pfizer, Eli Lilly and Seattle Genetics, has acted as consultant (not compensated) to Seattle Genetics, Pfizer, Novartis, Bristol Myers Squibb, Merck, AstraZeneca and Roche-Genentech and has acted as consultant (paid to her institution) to Aduro Biotech, Novartis, GlaxoSmithKline and G1 Therapeutics. F.A. is a member of the Advisory Boards for Pfizer, AstraZeneca, Eli Lilly, Roche-Genentech, Novartis and Daiichi Sankyo, acknowledges grant support from Pfizer, AstraZeneca, Eli Lilly, Novartis and Daiichi Sankyo and is a co-founder of Pegacsy. V.C.G.T.-H. reports grants and personal fees from Pfizer, Roche, Novartis and Eli Lilly, grants from Eisai and personal fees from Accord. S.T. has received funding from Ventana Medical Systems Inc (grant numbers 10467 and 10530), has received speaking fees from Roche, AstraZeneca, Novartis and Ipsen and has the following European and US patent filed: Indel mutations as a therapeutic target and predictive biomarker (PCTGB2018/051892) and European patent: Clear Cell Renal Cell Carcinoma Biomarkers (P113326GB). M.J.-H. is a member of the Advisory Board for Achilles Therapeutics. S.F.B. holds a patent related to some of the work described targeting CIN and the cGAS-STING pathway in advanced cancer, owns equity in, receives compensation from and serves as a consultant and on the Scientific Advisory Board and Board of Directors of Volastra Therapeutics, and has also consulted for Sanofi, received sponsored travel from the Prostate Cancer Foundation, and both travel and compensation from Cancer Research UK. N.M. has stock options in and has consulted for Achilles Therapeutics and holds a European patent in determining HLA LOH (PCT/GB2018/052004). C.S. acknowledges grant support from Pfizer, AstraZeneca, Bristol Myers Squibb, Roche-Ventana, Boehringer-Ingelheim, Archer Dx Inc (collaboration in minimal residual disease sequencing technologies) and Ono Pharmaceutical, is an AstraZeneca Advisory Board Member and Chief Investigator for the MeRmaiD1 clinical trial, has consulted for Pfizer, Novartis, GlaxoSmithKline, MSD, Bristol Myers Squibb, Celgene, AstraZeneca, Illumina, Genentech, Roche-Ventana, GRAIL, Medicxi and the Sarah Cannon Research Institute, has stock options in Apogen Biotechnologies, Epic Bioscience, GRAIL, and has stock options and is co-founder of Achilles Therapeutics. C.S. holds European patents relating to assay technology to detect tumour recurrence (PCT/GB2017/053289); to targeting neoantigens (PCT/EP2016/059401), identifying patent response to immune checkpoint blockade (PCT/EP2016/071471), determining HLA LOH (PCT/GB2018/052004), predicting survival rates of patients with cancer (PCT/GB2020/050221), identifying patients who respond to cancer treatment (PCT/GB2018/051912), a US patent relating to detecting tumour mutations (PCT/US2017/28013) and both a European and US patent related to identifying insertion/deletion mutation targets (PCT/GB2018/051892).

Additional information

Peer review information Nature thanks Rameen Beroukhim and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Measuring CIN across tumour types.

a, Schematic of the analyses of allele-specific copy number alterations. Left, the SCNA profiles across the genome for the two samples of a tumour (red, A allele; blue, B allele), with raw allele-specific copy number values for heterozygous SNPs shown as points and inferred allele-specific integer copy number states as lines. The clonality of the SCNAs across the two samples is indicated by a track between the two SCNA profiles, with clonal SCNAs indicated in grey, subclonal SCNAs in yellow and both clonal and subclonal SCNAs in dashed yellow and grey. All SCNA profile plots in the figure are scaled by the number of data points per chromosome. Top right, the approach to summarise SCNA timing (clonal versus subclonal) from the tumour. Bottom right, the integer SCNA profile across the genome of the inferred MRCA based on the integer SCNA profiles of the two samples of the tumour. b, c, Multi-sample phasing (b) and SCNA calling relative to ploidy (c). b, Multi-sample phasing is the method that we used to obtain allele-specific copy number profiles. This allowed us to identify previously undetected allelic imbalance (yellow boxes), and mirrored subclonal allelic imbalance and parallel SCNAs (purple boxes). c, Chromosomal illustrations and nomenclature of various SCNAs. As SCNAs are reported relative to ploidy, illustrations are provided for the diploid, triploid and tetraploid states. AI, allelic imbalance. d, e, Pan-cancer cohort characteristics. Our pan-cancer multi-sample cohort is summarised by tumour type in these bar plots, indicating the total number of patients (d) with the bar plot coloured according to the number of samples each tumour contributes, and tumour samples (e) with the bar plot coloured according to the type of sample.

Extended Data Fig. 2 SCNA correlates across tumour types.

a, Scatter plots indicating, for each tumour type, the association between the number of samples and the proportion of the genome affected by subclonal SCNAs. ρ and P values are from Spearman correlation tests. b, Scatter plots showing median purity per tumour versus the proportion of the genome affected by subclonal SCNA. ρ and P values are from Spearman correlation tests. c, Comparing the proportion of the genome affected by clonal and subclonal SCNAs. The median value for each tumour type is indicated. The size of the dots indicates the number of tumours in the corresponding tumour type. Red dots indicate tumour types with significant differences in the proportion of the genome affected by clonal versus subclonal SCNAs. A two-sided Student’s t-test was used to compare proportions of the genome affected by clonal and subclonal SCNAs. ac, Tumour types with tumour samples from at least 10 patients were included: bladder urothelial carcinoma (BLCA, n = 26), ER+ breast cancer (ER+ BRCA, n = 19), HER2+ breast cancer (HER2+ BRCA, n = 18), triple-negative breast cancer (TN BRCA, n = 17), colorectal adenocarcinoma (COAD, n = 13), oesophageal adenocarcinoma (ESCA, n = 22), glioma (n = 12), clear cell renal cell carcinoma (KIRC, n = 54), lung adenocarcinoma (LUAD, n = 84), lung squamous cell carcinoma (LUSC, n = 31), prostate adenocarcinoma (PRAD, n = 10), melanoma (SKCM, n = 30) and endometrial carcinoma (UCEC, n = 27). d, The results of the linear regression analysis between LUAD and HER2+ breast cancer of the proportion of the genome subject to subclonal SCNAs along with the number of samples from each tumour and the median sample purity for each tumour.

Extended Data Fig. 3 NSCLC SCNAs correlate with cell cycle gene expression and tumour cell characteristics.

a, b, Scatter plots comparing the average cell cycle gene expression in LUAD tumours (n = 36), LUSC tumours (n = 15) and NSCLC-other tumours (n = 7) with the total proportion of the genome affected by SCNAs. Each dot is coloured according to tumour type. (a) and the proportion of the genome affected by clonal SCNAs (b). c, The proportion of the genome affected by subclonal SCNAs. d, The proportion of SCNAs that are subclonal. ad, ρ and P values are from Spearman correlation tests. Associations between tumour cell characteristics and SCNA statistics for LUAD (n = 53), LUSC (n = 27) and NSCLC-other (n = 3). eh, Mitotic index scores for each tumour are compared against total SCNAs (e), clonal SCNAs (f), subclonal SCNAs (g) and the proportion of SCNAs that are subclonal (h) in each tumour. Each dot is coloured according to tumour type. ρ and P values are from Spearman correlation tests. il, Association between tumour volume and SCNA metrics. For each tumour for which both digitized slides and tumour volume information were available (n = 83), we performed Spearman correlation tests comparing the tumour volume with the total proportion of the genome affected by SCNAs (i), the proportion of the genome affected by clonal SCNAs (j), the proportion of the genome affected by subclonal SCNAs (k) and the proportion of SCNAs that are subclonal (l). Padj values reflect P values from linear regression models incorporating the number of samples as well as estimated tumour volume and SCNA measure investigated. mp, Associations between tumour cell characteristics and SCNA statistics for LUAD (n = 53), LUSC (n = 27) and NSCLC-other (n = 3). Anisonucleosis scores for each tumour are compared with the proportion of the genome affected by SCNAs (m), clonal SCNAs (n) or subclonal SCNAs (o) and the proportion of SCNAs that are subclonal (p) in each tumour. Each dot is coloured according to tumour type. The lines represent the median of each group. es, effect size.

Extended Data Fig. 4 WGD across tumour types.

a, Bar plots indicating the number and proportion of tumours of each tumour type that show WGD. Subclonal WGD tumours are indicated in blue. b, Beeswarm plots comparing the proportion of the genome affected by clonal or subclonal SCNAs and mirrored subclonal allelic imbalance (MSAI) in WGD and non-WGD tumours. Black bars indicate the median of each distribution. Two-sided Student’s t-tests were used for each comparison. c, Comparing the proportion of the genome affected by clonal or subclonal SCNAs in matched WGD and non-WGD samples from tumours with subclonal WGD. Bars indicate, for each patient with subclonal WGD, the difference between the median proportion of the genome affected by SCNAs in WGD and non-WGD samples. The inset beeswarm plots compare the proportion  of the genome affected by different types of SCNAs in WGD and non-WGD samples. The black bars in the beeswarm plots represent the medians of each group. df, Impact of OG-TSG score on average arm-level copy number changes. Scatter plots showing the average subclonal arm-level change from MRCA in non-WGD (d; n = 171), WGD (e; n = 194) and subclonal WGD (f; n = 29) tumours versus arm OG–TSG score. Shaded areas indicate the 95% confidence interval. ρ and P values are from Spearman correlation tests. g, Scatter plot showing the average clonal (MRCA) copy number in the entire cohort (n = 394) versus chromosome arm size. hj, Scatter plots showing the average subclonal arm-level change from MRCA in non-WGD (h; n = 171), WGD (i; n = 194) and subclonal WGD (j; n = 29) tumours versus chromosome size. Shaded areas indicate the 95% confidence interval. ρ and P values are from Spearman correlation tests.

Extended Data Fig. 5 Markov chain modelling of karyotype evolution.

a, List of parameters used for Markov chain modelling. b, Diagrams of simplified Markov chain for each chromosome arm and bar charts of the resulting probability distributions of arm-level copy number. ce, Beeswarm plots showing the difference in deviance score on a per-tumour basis for non-WGD (n = 171), WGD (n = 194) and subclonal WGD (n = 29) tumours. Black horizontal bars indicate the median of the distribution. Paired two-tailed Student’s t-tests were performed between the deviance scores of the first and second model included in each comparison. es, effect size. c, Comparison between the unweighted (neutral) model and the weighted model that includes OG–TSG scores. d, Comparison between the unweighted model and the model with scrambled OG–TSG scores. e, Comparison between the weighted model that includes OG–TSG scores and the model with scrambled OG–TSG scores. f, g, For each context (non-WGD, WGD or subclonal WGD), the percentage of samples in which the OG–TSG-weighted model outperforms the unweighted model (f) or scrambled model (g) is shown. hj, Robustness analysis of the Markov chain model of karyotype evolution. Graphs show the relative performance of the three iterations of the model with varying values of g with non-WGD (pGD = 0), WGD (pGD = 0.005) and subclonal WGD (pGD = 0.012) input. The model with scrambled scores has been run for 10 different random permutations of the chromosomes. k, l, Graphs show the performance of three iterations of the model with changing values of pGD (pGD = 0.003 in k and pGD = 0.007 in l) with WGD data. m, n, Graphs show the performance of three iterations of the model with changing values of pGD (pGD = 0.01 in m and pGD = 0.014 in n) with subclonal WGD data. oq, Graphs show the performance of the three iterations of the model when varying pmisseg with non-WGD, WGD and subclonal WGD input data.

Extended Data Fig. 6 Subclonal SCNA landscape across tumour types.

ah, The following tumour types were analysed: bladder urothelial carcinoma (a; n = 26), ER+ breast cancer (b; n = 19), HER2+ breast cancer (c; n = 18), triple-negative breast cancer (d; n = 17), colorectal adenocarcinoma (e; n = 13), oesophageal adenocarcinoma (f; n = 22), glioma (g; n = 12) and KIRC (h; n = 54). n numbers represent tumours. Across-genome plots show clonal and subclonal SCNAs. Within each tumour type for each chromosome, the following data are shown (top to bottom): the proportion of patients with gains or amplifications. The black line indicates the total proportion of patients with gains/amplifications; the yellow and grey lines or shades indicate the proportion of patients with subclonal and clonal gains, respectively. The MRCA was derived by phylogenetic analysis (see Methods, ‘Ancestral reconstruction and phylogeny inference’). For each locus, the frequency of gains (red) and losses (blue) found in the MRCAs of the tumours are indicated. The GISTIC2.0 events. These tracks indicate significant SCNA focal events that were identified by GISTIC2.0 (see Methods, ‘GISTIC2.0 peak definition’ and ‘GISTIC2.0 consensus peak definition’) and recurrent arm-level events (see Methods, ‘Arm-level SCNA definition’). The proportion of patients with loss/LOH events. The black line indicates the total proportion of patients with loss/LOH events; the yellow and grey lines or shades indicate the proportion of patients with subclonal and clonal losses, respectively. The black, yellow and grey lines indicate significance thresholds for total loss/LOH, subclonal loss/LOH and clonal loss/LOH, respectively. Proportion of patients with mirrored subclonal allelic imbalance (MSAI) originating from distinct haplotypes identified by multi-sample phasing. The red line indicates the significance threshold determined by a permutation test at the 0.05 level (see Methods, ‘Permutation test for recurrence of SCNA across tumours’).

Extended Data Fig. 7 Subclonal SCNA landscape across tumour types.

ae, The following tumour types were analysed: LUAD (a; n = 84), LUSC (b; n = 31), prostate adenocarcinoma (c; n = 10), SKCM (d; n = 30) and endometrial carcinoma (e; n = 27). Across-genome plots show clonal and subclonal SCNAs. Within each tumour type for each chromosome, the following data are shown (top to bottom): the proportion of patients with gains or amplifications. The black line indicates the total proportion of patients with gains/amplifications; the yellow and grey lines or shades indicate the proportion of patients with subclonal and clonal gains, respectively. The MRCA was derived by phylogenetic analysis (see Methods, ‘Ancestral reconstruction and phylogeny inference’). For each locus, the frequency of gains (red) and losses (blue) found in the MRCAs of the tumours are indicated. The GISTIC2.0 events. These tracks indicate significant SCNA focal events that were identified by GISTIC2.0 (see Methods, ‘GISTIC2.0 peak definition’ and ‘GISTIC2.0 consensus peak definition’) and recurrent arm-level events (see Methods, ‘Arm-level SCNA definition’). The proportion of patients with loss/LOH events. The black line indicates the total proportion of patients with loss/LOH events; the yellow and grey lines or shades indicate the proportion of patients with subclonal and clonal losses, respectively. The black, yellow and grey lines indicate significance thresholds for total loss/LOH, subclonal loss/LOH and clonal loss/LOH, respectively. Proportion of patients with mirrored subclonal allelic imbalance (MSAI) originating from distinct haplotypes identified by multi-sample phasing. The red line indicates the significance threshold determined by a permutation test at the 0.05 level (see Methods, ‘Permutation test for recurrence of SCNA across tumours’).

Extended Data Fig. 8 Recurrent SCNA across tumour types.

a, b, Difference in gains and losses in consensus-peak region gains (red, n = 255) and losses (blue, n = 149) (a) and chromosome arm gains (red, n = 95) and losses (blue, n = 200) across all tumour types (b). Black horizontal bars indicate the median of the distribution. Significance testing was performed using an unpaired Student's t-test. c, Classification of chromosomal arm-level events according to timing. Left, heat map of the percentage of subclonal occurrence of all events in each tumour type. The numerator within each cell indicates, in that tumour type, the total number of subclonal occurrences of that event and the denominator indicates the total number of both clonal and subclonal occurrences of that event in that tumour type. Shading of each cell in the heat map indicates the percentage of subclonal occurrences of an event within a tumour type with orange indicating a higher subclonality and grey indicating a higher clonality. The border of each cell indicates the classification of that event in a tumour type as either early (grey border), intermediate (no border) or late (orange border). Right, bar plot of arm-level events ordered by median percentage of subclonal occurrences across tumour types (bottom axis). Bars representing gain events are coloured in red and loss events are coloured in blue. Horizontal black lines indicate separation of events into pan-cancer categories of early, intermediate and late, according to tertiles of the median proportion of SCNAs that is subclonal. Dots centred on the same axis positions indicate the total event count of each loss or gain event across tumour types (top axis). d, Enrichment of early, intermediate and late consensus peak events with known cancer-associated genes. Heat map indicating the resulting P values from two-sided Fisher’s exact tests comparing the overlap of genes in early, intermediate and late consensus peaks with previously reported oncogenes and tumour-suppressor genes. Gain peaks were investigated in relation to oncogenes, while loss peaks were investigated in relation to tumour-suppressor genes. Significant overlaps (Benjamini–Hochberg-adjusted P < 0.05) are indicated with an asterisk (see Methods, ‘Cancer-associated gene and fragile site enrichment’). e, Enrichment of early, intermediate and late consensus peak events with chromosome fragile sites. Heat map indicating the resulting P values from Fisher’s exact tests comparing the overlap of cytobands found in early, intermediate and late consensus peaks with cytobands from previously reported chromosome fragile sites. Significant overlaps (Benjamini–Hochberg-adjusted P < 0.05) are indicated with an asterisk (see Methods, ‘Cancer-associated gene and fragile site enrichment’). f, Prevalence of SNVs and indels in cancer-associated genes. Heat map displaying the proportion of samples from each tumour type with an SNV or indel in the corresponding cancer-associated gene. Yellow asterisks indicate where the SNVs and indels are present clonally in ≥75% of tumours in the corresponding tumour type.

Extended Data Fig. 9 Recurrent parallel evolution and LOH across the genome.

a, Across-genome plot showing the frequency of parallel gain/amplification events in red and frequency of parallel LOH events in blue. The dashed red lines indicate the significance threshold determined by a permutation test. b, Example of parallel evolution on chromosome 1 in CRUK0005. log2[R], B-allele frequency (BAF) and allele-specific expression (ASE) plots are shown for chromosome 1 in samples 3 and 4. On the phylogenetic tree, we indicate the branches in which the parallel gains of chromosome 1 were identified. c, Correlating intra-tumour heterogeneity (ITH) for each gene at the DNA and RNA levels. The scatter plot shows that the percentage of expressed genes with allele-specific DNA intratumour heterogeneity correlates with the percentage of expressed genes with allele-specific RNA intratumour heterogeneity. Only the 43 tumours, for which we had paired multi-sample exome-sequencing and multi-sample RNA sequencing data, were included in this analysis. d, Prevalence of single haploid copies in WGD tumours. Across-genome plot showing the frequency of loss to a single haploid copy in WGD tumours at the cytoband level. Clonal loss to a single haploid copy is shown in grey. Subclonal loss to a single haploid copy is shown in orange. The solid black line indicates the total frequency, including both clonal and subclonal events, of loss to a single haploid copy. HLA LOH is not shown as only the whole-exome sequencing subset of our cohort could be analysed using the LOHHLA bioinformatics tool (see Methods, ‘HLA LOH detection’). e, Prevalence of LOH in WGD tumours. This across-genome plot at the cytoband level shows the proportion of tumours with LOH. The solid black line indicates the total proportion of tumours with either subclonal or clonal LOH; the yellow shading indicates the proportion of tumours with WGD in the cohort that had subclonal LOH at these cytobands. The dashed grey lines demarcate the borders between separate chromosomes. f, Prevalence of HLA LOH across tumour types. We indicate for each tumour type the count and proportion of tumours in which HLA LOH was observed. Dark grey and orange bars show tumours for which HLA LOH was observed clonally or subclonally, respectively; light grey bars show tumours for which no HLA LOH was observed.

Extended Data Fig. 10 SCNAs in metastatic samples.

a, Beeswarm plot indicating the total proportion of the genome affected by either clonal or subclonal SCNAs in primary tumour samples (red dots) or metastatic samples (blue dots). The black bars indicate the median of the distribution. A two-sided unpaired Student’s t-test was used in this comparison; the P value and effect size(es) are shown. b, Difference in the percentage of the genome affected by SCNAs between paired metastatic and primary tumour samples (n = 152). The waterfall plot shows whether a greater or lesser proportion of the genome was affected by total SCNAs in the primary or metastatic sample(s) of tumours with at least one primary tumour sample and at least one metastatic sample. Purple bars indicate that a greater proportion of the genome was affected by total SCNAs in the metastatic sample and pink bars indicate a greater proportion was affected in the primary tumour sample. A two-sided paired Student’s t-test was used for this comparison. c, Beeswarm plots indicating, for each primary tumour and metastatic sample, the proportion of the genome impacted by SCNAs. These are the same samples included in the analysis of a. The black bars indicate the median of the distribution. Two-sided unpaired Student’s t-tests were used for each comparison; P values are indicated at the top of each plot. d, Beeswarm plots indicating for each primary tumour and metastatic sample the proportion of SCNAs that is subclonal. These are the same samples included in the analysis of a. The black bars show the median of the distribution. Two-sided unpaired Student’s t-tests were used for each comparison; P values are indicated at the top of each plot. e, Shared and private primary tumour and metastatic LOH. Bar plots separated by tumour type with each stacked bar representing the LOH identified in a single tumour sample with both primary tumour and metastatic samples. Each bar is coloured according to the proportion of LOH identified in that tumour that is shared between the primary tumour and metastatic samples (blue), the proportion of LOH present only in primary tumour samples (green) or the proportion of LOH present only in metastatic samples (red). The grey horizontal lines show the median value of the proportion of LOH shared between primary tumour and metastatic samples for each tumour type. fi, Chromosomal arm-level events enriched in metastatic samples. We included only the four tumour types with >10 tumours with paired primary tumour–metastatic samples: LUAD (f), ER+ breast cancer (g), HER2+ breast cancer (h) and KIRC (i). In each panel, all chromosome arms are featured. The bar plots show the number of tumours with arm-level SCNAs in each tumour type. The colour of the bars indicates whether that arm-level event was enriched, depleted or maintained in the metastatic sample when compared with the corresponding primary tumour sample from the disease of the same patient. Bars facing right represent gain SCNAs; bars facing left represent loss SCNAs. The rectangular blocks between the bar plots indicate whether the arm-level events were recurrent events. Orange blocks represent recurrent subclonal events; grey blocks represent recurrent clonal events; blocks that are partially grey and partially orange represent events that are clonally and subclonally recurrent. The asterisks indicate whether the arm-level event is significantly enriched in metastatic samples in the combined paired (two-sided binomial test) and unpaired (test of equal or given proportions) primary tumour–metastatic analysis.

Supplementary information

Supplementary Information

This file contains the Supplementary Methods.

Reporting Summary

Supplementary Table 1

Cohort data.

Supplementary Table 2

Mitotic Count and Anisonucleosis Classification.

Supplementary Table 3

Consensus Peak Regions.

Supplementary Table 4

Recurrent Arm Events.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Watkins, T.B.K., Lim, E.L., Petkovic, M. et al. Pervasive chromosomal instability and karyotype order in tumour evolution. Nature 587, 126–132 (2020). https://doi.org/10.1038/s41586-020-2698-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-020-2698-6

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer