Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

TooManyCells identifies and visualizes relationships of single-cell clades

Abstract

Identifying and visualizing transcriptionally similar cells is instrumental for accurate exploration of the cellular diversity revealed by single-cell transcriptomics. However, widely used clustering and visualization algorithms produce a fixed number of cell clusters. A fixed clustering ‘resolution’ hampers our ability to identify and visualize echelons of cell states. We developed TooManyCells, a suite of graph-based algorithms for efficient and unbiased identification and visualization of cell clades. TooManyCells introduces a visualization model built on a concept intentionally orthogonal to dimensionality-reduction methods. TooManyCells is also equipped with an efficient matrix-free divisive hierarchical spectral clustering different from prevalent single-resolution clustering methods. TooManyCells enables multiresolution and multifaceted exploration of single-cell clades. An advantage of this paradigm is the immediate detection of rare and common populations that outperforms popular clustering and visualization algorithms, as demonstrated using existing single-cell transcriptomic data sets and new data modeling drug-resistance acquisition in leukemic T cells.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The TooManyCells visualization and clustering algorithms.
Fig. 2: Example of TooManyCells visualization capabilities using 11 mouse organs.
Fig. 3: Comparative analysis of clustering performance and scalability.
Fig. 4: Detection of cells from two rare populations mixed with a common population was benchmarked for widely used clustering algorithms.
Fig. 5: TooManyCells stratifies rare plasmablasts in mouse spleen.
Fig. 6: TooManyCells identifies GSI-resistant cell heterogeneity and detects resistant-like T-ALL cells.

Similar content being viewed by others

Data availability

The accession number for the new data sets reported in this paper is Gene Expression Omnibus: GSE138892. Microfluidics single-cell RNA-seq count data from 11 organs in 3 female and 4 male, C57BL/6 NIA, 3-month-old mice were obtained from https://figshare.com/articles/_/5715025, removing P8 libraries due to outlier cell counts22. FACS-purified CD14+ monocytes, CD19+ B and CD4+ T cells were obtained from https://support.10xgenomics.com/single-cell-gene-expression/datasets (ref. 23). Data for seven cancer-cell lines were obtained from GSE81861 (ref. 17). FACS-purified B lymphocytes/natural killer, megakaryocyte-erythroid, and granulocyte-monocyte progenitors were obtained from GSE117498 (ref. 25).

Code availability

TooManyCells is available at https://github.com/faryabib/too-many-cells or as a Docker image https://cloud.docker.com/repository/docker/gregoryschwartz/too-many-cells/. An R wrapper for TooManyCells is available at https://cran.r-project.org/web/packages/TooManyCellsR. BirchBeer is available at https://github.com/faryabib/birch-beer or as a Docker image https://cloud.docker.com/repository/docker/gregoryschwartz/birch-beer. Codes necessary to reproduce the presented analyses are available at https://github.com/faryabib/NatMethods_TooManyCells_analysis.

References

  1. Lafzi, A., Moutinho, C. & Picelli, S. Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies. Nat. Protoc. 13, 2742 (2018).

    CAS  PubMed  Google Scholar 

  2. Trapnell, C. Defining cell types and states with single-cell genomics. Genome Res. 25, 1491–1498 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Packer, J. & Trapnell, C. Single-cell multi-omics: an engine for new quantitative models of gene regulation. Trends Genet. 34, 653–665 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Liu, S. & Trapnell, C. Single-cell transcriptome sequencing: recent advances and remaining challenges. F1000Res 5, F1000 (2016).

    PubMed  PubMed Central  Google Scholar 

  5. Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell rna-seq in the past decade. Nat. Protoc. 13, 599–604 (2018).

    CAS  PubMed  Google Scholar 

  6. Levine, J. H. et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162, 184–197 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Qiu, X. et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods 14, 979–982 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Azizi, E., Prabhakaran, S., Carr, A. & Pe’er, D. Bayesian inference for single-cell clustering and imputing. Genomics Comput. Biol. 3, 46 (2017).

    Google Scholar 

  10. Ho, Y.-J. et al. Single-cell RNA-seq analysis identifies markers of resistance to targeted BRAF inhibitors in melanoma cell populations. Genome Res. 28, 1353–1363 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Van der Maaten, L. & Hinton, G. Visualizing data using T-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008).

    Google Scholar 

  12. McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).

  13. Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).

    CAS  Google Scholar 

  14. Nutt, S. L., Hodgkin, P. D., Tarlinton, D. M. & Corcoran, L. M. The generation of antibody- secreting plasma cells. Nat. Rev. Immunol. 15, 160–171 (2015).

    CAS  PubMed  Google Scholar 

  15. Zeisel, A. et al. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142 (2015).

    CAS  PubMed  Google Scholar 

  16. Lin, P., Troup, M. & Ho, J. W. K. CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol. 18, 59 (2017).

    PubMed  PubMed Central  Google Scholar 

  17. Li, H. et al. Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors. Nat. Genet. 49, 708–718 (2017).

    CAS  PubMed  Google Scholar 

  18. Zappia, L. & Oshlack, A. C. lustering trees: a visualization for evaluating clusterings at multiple resolutions. Gigascience 7, 7–9 (2018).

    Google Scholar 

  19. Newman, M. E. J. & Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004).

    CAS  Google Scholar 

  20. Lancichinetti, A. & Fortunato, S. Limits of modularity maximization in community detection. Phys. Rev. E 84, 066122 (2011).

    Google Scholar 

  21. Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).

    Google Scholar 

  22. The Tabula Muris Consortium. et al. Single-cell transcriptomics of 20 mouse organs creates a tabula muris. Nature 562, 367–372 (2018).

    CAS  Google Scholar 

  23. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Herman, J. S. & Sagar and Grün, D. Fateid infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat. Methods 15, 379–386 (2018).

    CAS  PubMed  Google Scholar 

  25. Pellin, D. et al. Comprehensive single cell transcriptional landscape of human hematopoietic progenitors. Nat Commun 10, 1–15 (2019).

    CAS  Google Scholar 

  26. Dahlin, J. S. et al. A single-cell hematopoietic landscape resolves 8 lineage trajectories and defects in kit mutant mice. Blood 131, e1–e11 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Borges da Silva, H. et al. Splenic macrophage subsets and their function during blood-borne infections. Front. Immunol. 6, 480 (2015).

    PubMed  PubMed Central  Google Scholar 

  28. Den Haan, J. M. M. & Kraal, G. Innate immune functions of macrophage subpopulations in the spleen. J. Innate Immun. 4, 437–445 (2012).

    PubMed  PubMed Central  Google Scholar 

  29. Hey, Y. Y. & O’Neill, H. C. Murine spleen contains a diversity of myeloid and dendritic cells distinct in antigen presenting function. J. Cell. Mol. Med. 16, 2611–2619 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Jojic, V. et al. Identification of transcriptional regulators in the mouse immune system. Nat. Immunol. 14, 633–643 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Winter, S. S. et al. Improved survival for children and young adults with t-lineage acute lymphoblastic leukemia: results from the children’s oncology group AALL0434 methotrexate randomization. J. Clin. Oncol. 36, 2926–2934 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Marks, D. I. et al. T-cell acute lymphoblastic leukemia in adults: clinical features, immunophenotype, cytogenetics, and outcome from the large randomized prospective trial (ukall XII/ECOG 2993). Blood 114, 5136–5145 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Aster, J. C., Pear, W. S. & Blacklow, S. C. The varied roles of notch in cancer. Annu. Rev. Pathol. Mech. Dis. 12, 245–275 (2017).

    CAS  Google Scholar 

  34. Knoechel, B. et al. An epigenetic mechanism of resistance to targeted therapy in T cell acute lymphoblastic leukemia. Nat. Genet. 46, 364–370 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Dluzen, D., Li, G., Tacelosky, D., Moreau, M. & Liu, D. X. BCL-2 is a downstream target of ATF5 that mediates the prosurvival function of ATF5 in a cell type-dependent manner. J. Biol. Chem. 286, 7705–7713 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Yamazaki, T. et al. Regulation of the human chop gene promoter by the stress response transcription factor ATF5 via the AARE1 site in human hepatoma HepG2 cells. Life Sci. 87, 294–301 (2010).

    CAS  PubMed  Google Scholar 

  37. Liu, D. X., Qian, D., Wang, B., Yang, J.-M. & Lu, Z. P300-dependent ATF5 acetylation is essential for egr-1 gene activation and cell proliferation and survival. Mol. Cell. Biol. 31, 3906–3916 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Angelastro, J. M. Targeting ATF5 in cancer. Trends Cancer 3, 471–474 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Karpel-Massler, G. et al. A synthetic cell-penetrating dominant-negative ATF5 peptide exerts anticancer activity against a broad spectrum of treatment-resistant cancers. Clin. Cancer. Res. 22, 4698–4711 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Liberzon, A. et al. The molecular signatures database hallmark gene set collection. Cell Systems 1, 417–425 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Shu, L., Chen, A., Xiong, M. & Meng, W. Efficient spectral neighborhood blocking for entity resolution. In 2011 IEEE 27th International Conference on Data Engineering 1067–1078 (IEEE, 2011).

  43. Shi, J. & Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 18 (2000).

    Google Scholar 

  44. Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28, 11–21 (1972).

    Google Scholar 

  45. Manning, C. D., Raghavan, P. & Schütze, H. Introduction to Information Retrieval (Cambridge University Press, 2008).

  46. Salton, G., Wong, A. & Yang, C. S. A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975).

    Google Scholar 

  47. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    CAS  PubMed  Google Scholar 

  48. Hill, M. O. Diversity and evenness: a unifying notation and its consequences. Ecology 54, 427 (1973).

    Google Scholar 

  49. Schwartz, G. W. & Hershberg, U. Conserved variation: identifying patterns of stability and variability in BCR and TCR V genes with different diversity and richness metrics. Phys. Biol. 10, 035005 (2013).

    PubMed  Google Scholar 

  50. Schwartz, G. W. & Hershberg, U. Germline amino acid diversity in b cell receptors is a good predictor of somatic selection pressures. Front. Immunol. 4, 357 (2013).

    PubMed  PubMed Central  Google Scholar 

  51. Meng, W. et al. An atlas of b-cell clonal distribution in the human body. Nat. Biotechnol. 35, 879–884 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Heck, K. L., van Belle, G. & Simberloff, D. Explicit calculation of the rarefaction diversity measurement and the determination of sufficient sample size. Ecology 56, 1459 (1975).

    Google Scholar 

  53. Tian, L. et al. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat Methods 16, 479–487 (2019).

    CAS  PubMed  Google Scholar 

  54. Ronen, J. & Akalin, A. netSmooth: network-smoothing based imputation for single cell RNA-seq. F1000Res 7, 8 (2018).

    PubMed  PubMed Central  Google Scholar 

  55. Dai, H., Li, L., Zeng, T. & Chen, L. Cell-specific network constructed by single-cell RNA sequencing data. Nucleic Acids Res. 47, e62 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Tan, P.-N., Steinbach, M., Karpatne, A. & Kumar, V. Introduction to Data Mining 2nd edn (Pearson, 2019).

  57. Kvålseth, T. O. On normalized mutual information: measure derivations and properties. Entropy 19, 631 (2017).

    Google Scholar 

  58. Zappia, L., Phipson, B. & Oshlack, A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 18, 174 (2017).

    PubMed  PubMed Central  Google Scholar 

  59. Schwartz, G. W., Shokoufandeh, A., Ontañón, S. & Hershberg, U. Using a novel clumpiness measure to unite data with metadata: finding common sequence patterns in immune receptor germline v genes. Pattern Recognit. Lett. 74, 24–29 (2016).

    Google Scholar 

  60. Wolock, S. L., Lopez, R. & Klein, A. M. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 8, 281–291.e9 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by T32-CA-009140 (to G.W.S.), LLS-5456-17 (to J.P.), R01-CA-215518 (to W.S.P.), R01-HL-145754, the Penn Epigenetics pilot award and the Sloan Foundation Grant (to G.V.), Therapeutics Translational Medicine and Therapeutics program for Transdisciplinary Awards Program in Translational Medicine and Therapeutics, Concern Foundation’s The Conquer Cancer Now Award, Susan G. Komen CCR185472448 and R01-CA-230800 (to R.B.F.).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: R.B.F., G.W.S.; Methodology: G.W.S., R.B.F.; Software: G.W.S.; Investigation: G.W.S., R.B.F., J.P., Y.Z.; Formal Analysis: G.W.S., R.B.F., J.P., M.F., S.M.S., L.X., Y.Z.; Resources and Reagents: R.B.F., G.V.; Writing, Review and Editing: G.W.S., R.B.F., W.S.P., J.P., Y.Z.; Writing, Original Draft: G.W.S., R.B.F.; Supervision: R.B.F.; Funding Acquisition: R.B.F.

Corresponding author

Correspondence to Robert B. Faryabi.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Peer review information Nicole Rusk and Lin Tang were the primary editors on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figures 1–33 and Notes 1–6

Reporting Summary

Supplementary Table 1

Differential expression analysis between the ascending (n = 1,780 cells) and sustained (n = 2,299 cells) subtrees in Supplementary Fig. 33e. QL F-test used with Benjamin–Hochberg method for multiple-hypothesis correction.

Supplementary Table 2

Differential expression analysis between untreated (n =2,338 cells) and short-term (n = 2,616 cells) populations in Supplementary Fig. 33i. QL F-test used with Benjamini–Hochberg method for multiple-hypothesis correction.

Supplementary Table 3

Differential expression analysis between parental (n =4,954 cells) and sustained (n = 2,417 cells) populations in Supplementary Fig. 33j. QL F-test used with Benjamini–Hochberg method for multiple-hypothesis correction.

Supplementary Table 4

Differential expression analysis between other parental (n = 4,926 cells) and resistant-like (n = 28 cells) populations in Supplementary Fig. 33k. QL F-test used with Benjamini–Hochberg method for multiple-hypothesis correction.

Supplementary Table 5

Differential expression analysis between sustained (n = 2,417 cells) and resistant-like (n = 28 cells) populations in Supplementary Fig. 33l. QL F-test used with Benjamini–Hochberg method for multiple-hypothesis correction.

Supplementary Table 6

Sequences of MYC and GAPDH RNA FISH probes.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schwartz, G.W., Zhou, Y., Petrovic, J. et al. TooManyCells identifies and visualizes relationships of single-cell clades. Nat Methods 17, 405–413 (2020). https://doi.org/10.1038/s41592-020-0748-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-020-0748-5

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics