Abstract
Cap analysis of gene expression (CAGE) was developed to detect the 5′ end of RNA. Trapping of the RNA 5′-cap structure enables the enrichment and selective sequencing of complete transcripts. Upscaled high-throughput versions of CAGE have enabled the genome-wide identification of transcription start sites, including transcriptionally active promoters and enhancers. CAGE sequencing can be exploited to draw comprehensive maps of active genomic regulatory elements in a cell type- and activation-specific manner. The cells of the immune system are among the best candidates to be analyzed in humans, since they are easily accessible. In this review, we discuss how CAGE data are instrumental for integrative analyses with quantitative trait loci and omics data, and their usefulness in the mechanistic interpretation of the effects of genetic variations over the entire human genome. Integrating CAGE data with the currently available omics information will contribute to better understanding of the genome-wide association study variants that lie outside of annotated genes, deepening our knowledge on human diseases, and enabling the targeted design of more specific therapeutic interventions.
Similar content being viewed by others
References
Noe Gonzalez M et al (2018) CTD-dependent and -independent mechanisms govern co-transcriptional capping of Pol II transcripts. Nat Commun 9(1):3392
Proudfoot NJ, Furger A, Dye MJ (2002) Integrating mRNA processing with transcription. Cell 108(4):501–512
Martinez-Rucobo FW et al (2015) Molecular basis of transcription-coupled pre-mRNA capping. Mol Cell 58(6):1079–1089
Ramanathan A, Robb GB, Chan SH (2016) mRNA capping: biological functions and applications. Nucleic Acids Res 44(16):7511–7526
Furuichi Y, Miura K (1975) A blocked structure at the 5′ terminus of mRNA from cytoplasmic polyhedrosis virus. Nature 253(5490):374–375
Shatkin AJ, Manley JL (2000) The ends of the affair: capping and polyadenylation. Nat Struct Biol 7(10):838–842
Edery I et al (1995) An efficient strategy to isolate full-length cDNAs based on an mRNA cap retention procedure (CAPture). Mol Cell Biol 15(6):3363–3371
Carninci P et al (1996) High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37(3):327–336
Carninci P et al (1997) High efficiency selection of full-length cDNA by improved biotinylated cap trapper. DNA Res 4(1):61–66
Kawai J et al (2001) Functional annotation of a full-length mouse cDNA collection. Nature 409(6821):685–690
Okazaki Y et al (2002) Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420(6915):563–573
Consortium, F., et al (2014) A promoter-level mammalian expression atlas. Nature 507(7493):462–470
Kodzius R et al (2006) CAGE: cap analysis of gene expression. Nat Methods 3(3):211–222
Shiraki T et al (2003) Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci U S A 100(26):15776–15781
Carninci P et al (2006) Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38(6):626–635
Consortium, F., et al (2009) The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nat Genet 41(5):553–562
Kanamori-Katayama M et al (2011) Unamplified cap analysis of gene expression on a single-molecule sequencer. Genome Res 21(7):1150–1159
Murata M et al (2014) Detecting expressed genes using CAGE. Methods Mol Biol 1164:67–85
Ong CT, Corces VG (2011) Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet 12(4):283–293
Murakawa Y et al (2016) Enhanced identification of transcriptional enhancers provides mechanistic insights into diseases. Trends Genet 32(2):76–88
Kim TK et al (2010) Widespread transcription at neuronal activity-regulated enhancers. Nature 465(7295):182–187
Calo E, Wysocka J (2013) Modification of enhancer chromatin: what, how, and why? Mol Cell 49(5):825–837
Li W, Notani D, Rosenfeld MG (2016) Enhancers as non-coding RNA transcription units: recent insights and future perspectives. Nat Rev Genet 17(4):207–223
Sartorelli V, Lauberth SM (2020) Enhancer RNAs are an important regulatory layer of the epigenome. Nat Struct Mol Biol 27(6):521–528
Shlyueva D, Stampfel G, Stark A (2014) Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet 15(4):272–286
Zhou VW, Goren A, Bernstein BE (2011) Charting histone modifications and the functional organization of mammalian genomes. Nat Rev Genet 12(1):7–18
Andersson R et al (2014) An atlas of active enhancers across human cell types and tissues. Nature 507(7493):455–461
Arner E et al (2015) Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347(6225):1010–1014
Tippens ND et al (2020) Transcription imparts architecture, function and logic to enhancer units. Nat Genet 52(10):1067–1075
Minnoye, L et al (2021) Chromatin accessibility profiling methods. Nat Rev Meth Primers 1:10. https://doi.org/10.1038/s43586-020-00008-9
Kristjansdottir K et al (2020) Population-scale study of eRNA transcription reveals bipartite functional enhancer architecture. Nat Commun 11(1):5963
Kilchert C, Wittmann S, Vasiljeva L (2016) The regulation and functions of the nuclear RNA exosome complex. Nat Rev Mol Cell Biol 17(4):227–239
Andersson R, Sandelin A (2020) Determinants of enhancer and promoter activities of regulatory elements. Nat Rev Genet 21(2):71–87
Schwalb B et al (2016) TT-seq maps the human transient transcriptome. Science 352(6290):1225–1228
Hirabayashi S et al (2019) NET-CAGE characterizes the dynamics and topology of human transcribed cis-regulatory elements. Nat Genet 51(9):1369–1379
Churchman LS, Weissman JS (2011) Nascent transcript sequencing visualizes transcription at nucleotide resolution. Nature 469(7330):368–373
Core LJ, Waterfall JJ, Lis JT (2008) Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322(5909):1845–1848
Chu T et al (2018) Chromatin run-on and sequencing maps the transcriptional regulatory landscape of glioblastoma multiforme. Nat Genet 50(11):1553–1564
Core LJ et al (2014) Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet 46(12):1311–1320
Henriques T et al (2018) Widespread transcriptional pausing and elongation control at enhancers. Genes Dev 32(1):26–41
Kwak H et al (2013) Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339(6122):950–953
Mayer A et al (2015) Native elongating transcript sequencing reveals human transcriptional activity at nucleotide resolution. Cell 161(3):541–554
Nojima T et al (2015) Mammalian NET-Seq reveals genome-wide nascent transcription coupled to RNA processing. Cell 161(3):526–540
Tome JM, Tippens ND, Lis JT (2018) Single-molecule nascent RNA sequencing identifies regulatory domain architecture at promoters and enhancers. Nat Genet 50(11):1533–1541
Wissink EM et al (2019) Nascent RNA analyses: tracking transcription and its regulation. Nat Rev Genet 20(12):705–723
Maruyama K, Sugano S (1994) Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138(1–2):171–174
Zhu YY et al (2001) Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30(4):892–897
Hagemann-Jensen M et al (2020) Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat Biotechnol 38(6):708–714
Klein AM et al (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161(5):1187–1201
Macosko EZ et al (2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161(5):1202–1214
Picelli S et al (2014) Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc 9(1):171–181
Ramskold D et al (2012) Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 30(8):777–782
Salimullah M et al (2011) NanoCAGE: a high-resolution technique to discover and interrogate cell transcriptomes. Cold Spring Harb Protoc 2011(1):pdb prot5559
Adiconis X et al (2018) Comprehensive comparative analysis of 5′-end RNA-sequencing methods. Nat Methods 15(7):505–511
Cain CE et al (2011) Gene expression differences among primates are associated with changes in a histone epigenetic modification. Genetics 187(4):1225–1234
Santos-Rosa H et al (2002) Active genes are tri-methylated at K4 of histone H3. Nature 419(6905):407–411
Hon CC et al (2017) An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543(7644):199–204
Plessy C et al (2010) Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat Methods 7(7):528–534
Kouno T et al (2019) C1 CAGE detects transcription start sites and enhancer activity at single-cell resolution. Nat Commun 10(1):360
Steinhaus R et al (2020) Pervasive and CpG-dependent promoter-like characteristics of transcribed enhancers. Nucleic Acids Res 48(10):5306–5317
Baillie JK et al (2017) Analysis of the human monocyte-derived macrophage transcriptome and response to lipopolysaccharide provides new insights into genetic aetiology of inflammatory bowel disease. PLoS Genet 13(3):e1006641
Claussnitzer M et al (2020) A brief history of human disease genetics. Nature 577(7789):179–189
Dimas AS et al (2009) Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325(5945):1246–1250
Javierre BM et al (2016) Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167(5):1369-1384 e19
Chandra V et al (2021) Promoter-interacting expression quantitative trait loci are enriched for functional genetic variants. Nat Genet 53(1):110–119
Witte S et al (2015) High-density P300 enhancers control cell state transitions. BMC Genomics 16:903
Ishigaki K et al (2017) Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis. Nat Genet 49(7):1120–1125
Raj T et al (2014) Polarization of the effects of autoimmune and neurodegenerative risk alleles in leukocytes. Science 344(6183):519–523
Consortium, E.P. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74
Djebali S et al (2012) Landscape of transcription in human cells. Nature 489(7414):101–108
Consortium, E.P., et al (2020) Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583(7818):699–710
Gorkin DU et al (2020) An atlas of dynamic chromatin landscapes in mouse fetal development. Nature 583(7818):744–751
Grubert F et al (2020) Landscape of cohesin-mediated chromatin loops in the human genome. Nature 583(7818):737–743
He P et al (2020) The changing mouse embryo transcriptome at whole tissue and single-cell resolution. Nature 583(7818):760–767
He Y et al (2020) Spatiotemporal DNA methylome dynamics of the developing mouse fetus. Nature 583(7818):752–759
Partridge EC et al (2020) Occupancy maps of 208 chromatin-associated proteins in one human cell type. Nature 583(7818):720–728
Van Nostrand EL et al (2020) A large-scale binding and functional map of human RNA-binding proteins. Nature 583(7818):711–719
Vierstra J et al (2020) Global reference mapping of human transcription factor footprints. Nature 583(7818):729–736
Batut P et al (2013) High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res 23(1):169–180
Boley N et al (2014) Genome-guided transcript assembly by integrative analysis of RNA sequence data. Nat Biotechnol 32(4):341–346
Harrow J et al (2012) GENCODE: the reference human genome annotation for the ENCODE Project. Genome Res 22(9):1760–1774
Derrien T et al (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22(9):1775–1789
Iyer MK et al (2015) The landscape of long noncoding RNAs in the human transcriptome. Nat Genet 47(3):199–208
Lagarde J et al (2017) High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat Genet 49(12):1731–1740
Sun YH et al (2021) Single-molecule long-read sequencing reveals a conserved intact long RNA profile in sperm. Nat Commun 12(1):1361
Grapotte M et al (2021) Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network. Nat Commun 12(1):3297
Manolio TA et al (2009) Finding the missing heritability of complex diseases. Nature 461(7265):747–753
Tam V et al (2019) Benefits and limitations of genome-wide association studies. Nat Rev Genet 20(8):467–484
Yang J et al (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–569
Hirschhorn JN (2009) Genomewide association studies—illuminating biologic pathways. N Engl J Med 360(17):1699–1701
Visscher PM et al (2017) 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet 101(1):5–22
Wainberg M et al (2019) Opportunities and challenges for transcriptome-wide association studies. Nat Genet 51(4):592–599
Garieri M et al (2017) The effect of genetic variation on promoter usage and enhancer activity. Nat Commun 8(1):1358
Banovich, N.E., et al., Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels. PLoS Genet, 2014. 10(9): p. e1004663.
Ye Y et al (2020) A multi-omics perspective of quantitative trait loci in precision medicine. Trends Genet 36(5):318–336
Kumasaka N, Knights AJ, Gaffney DJ (2016) Fine-mapping cellular QTLs with RASQUAL and ATAC-seq. Nat Genet 48(2):206–213
Wang X, Goldstein DB (2020) Enhancer domains predict gene pathogenicity and inform gene discovery in complex disease. Am J Hum Genet 106(2):215–233
Nasser J et al (2021) Genome-wide enhancer maps link risk variants to disease genes. Nat 593(7858):238–243
Maurano MT et al (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337(6099):1190–1195
Castillejo-Lopez C et al (2019) Detailed functional characterization of a waist-hip ratio locus in 7p15.2 defines an enhancer controlling adipocyte differentiation. iScience 20:42–59
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is a contribution to the special issue on: Genetics and functional genetics of Autoimmune diseases - Guest Editors: Yukinori Okada & Kazuhiko Yamamoto
Rights and permissions
About this article
Cite this article
Guerrini, M.M., Oguchi, A., Suzuki, A. et al. Cap analysis of gene expression (CAGE) and noncoding regulatory elements. Semin Immunopathol 44, 127–136 (2022). https://doi.org/10.1007/s00281-021-00886-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00281-021-00886-5