Transcriptional regulation in plants: Using omics data to crack the cis-regulatory code

https://doi.org/10.1016/j.pbi.2021.102058Get rights and content

Abstract

Innovative omics technologies, advanced bioinformatics, and machine learning methods are rapidly becoming integral tools for plant functional genomics, with tremendous recent advances made in this field. In transcriptional regulation, an initial lag in the accumulation of plant omics data relative to that of animals stimulated the development of computational methods capable of extracting maximum information from the available data sets. Recent comprehensive studies of transcription factor–binding profiles in Arabidopsis and maize and the accumulation of uniformly processed omics data in public databases have brought plant biologists into the big leagues, with many cutting-edge methods available. Here, we summarize the state-of-the-art bioinformatics approaches used to predict or infer the cis-regulatory code behind transcriptional gene regulation, focusing on their plant research applications.

Introduction

The rapid accumulation of omics data over the last decade provides an unprecedented opportunity for the system-level reconstruction of the molecular mechanisms governing plant growth and development [1,2,3∗∗,4]. Of all the regulatory events that shape the phenotypic manifestation of the plant genome, the transcriptional regulation of gene expression is a fountainhead. Chromatin immunoprecipitation-sequencing (ChIP-seq) techniques enable the annotation of genome-wide transcription factor (TF)–binding events, whereas the recently developed DNA affinity purification and sequencing (DAP-seq) approach [5∗∗] and the development of an efficient protoplast isolation/transformation system for expressing epitope-tagged TFs for ChIP-seq [6∗∗] have made feasible the large-scale profiling of TF binding. The application of these techniques revealed the binding profiles for a large number of Arabidopsis and maize TFs [5∗∗,6∗∗].

Deciphering the cis-regulatory code from these omics data provides opportunities for unveiling condition-specific regulatory programs [7, 8, 9, 10]. The main challenge is that the sequence is only one of many parameters determining whether a TF binds its target DNA and whether the binding event affects gene expression. One must also consider information about the epigenetic marks, chromatin accessibility, abundances of TFs and their interactors, specificity of protein–DNA biophysical interactions (which might be distinct for dimers and multimers of different compositions), and other factors that contribute to triggering a regulatory program. Nonetheless, the detection of TF-binding sites (TFBSs) in gene regulatory regions is a prerequisite for the study of regulatory events, including the cooperative action of TFs, which guide condition-specific gene expression [8, 9, 10]. This review summarizes the state-of-the-art bioinformatics approaches used to obtain the maximum information from the regulatory sequences to understand transcriptional regulatory patterns, with a focus on applications in plant research.

Section snippets

Input data: whole-genome annotation data and binding site models

Various genome-wide data sets can be used as input to characterize the functions of the transcriptional machinery and elucidate regulatory patterns (Figure 1). In addition, TFBS models provide valuable input for understanding the principles of transcriptional regulation.

From a single TFBS to a cis-regulatory syntax

A standard step for analyzing a single chromatin-precipitation peak set is de novo motif discovery, which detects overrepresented sequence patterns (motifs) using, for example, the HOMER tool [20]. In addition to identifying the most common variant of the TFBS, de novo motif discovery reveals other overrepresented motifs. These may include minor variants of the TFBS, reflecting specific regulatory modes. The crystal structures of the TF–DNA complexes recently solved for the developmental

Searching for TF-binding targets

To identify TF target genes from ChIP-seq data, it is necessary to associate the peak positions with the gene regulatory regions. The regulatory regions predominantly localize upstream of the transcription start site; however, they are also often found downstream of a gene, in introns, in untranslated regions, and even in protein-coding sequences. Such ‘non-upstream’ regulatory regions were identified in the TERMINAL FLOWER 1 (TFL1), AGAMOUS-LIKE 6 (AGL6), and LEAFY (LFY) floral development

Does gene regulation follow from TF binding?

TF–DNA binding near a gene does not necessarily induce changes in transcription of the gene. Therefore, it is crucial to test if the presence of the cis-regulatory element is associated with a transcriptional response. Coupling TF-binding targets deduced from the ChIP-seq peak set and data on differential gene expression (e.g. triggered by TF perturbation in knockout or overexpressing transgenic lines) is a straightforward approach to distinguish regulatory TF-binding events [37,45] (Figure 2).

Machine learning applications in plant functional genomics

Machine learning (ML) methods are a promising alternative to inferential statistics for deducing the cis-regulatory code from whole-genome annotation data [55,56]. ML offers a hypothesis-free approach to modeling the regulatory outcome by learning the transcription regulation signatures directly from functional genomics data sets. Taking into account the contribution of numerous features, this approach is beneficial for prediction of regulatory TF-binding events and gene expression. For this,

Future prospects

Because the cis-regulatory code is highly complex and manifests itself very specifically in each cell type, further progress in its comprehensive reconstruction requires a much larger amount of high-quality data from model plant species. In this regard, a plant initiative similar to the human ENCODE project would be invaluable [12∗, 67]. Rapid technological advancement in single-cell multiomics provides new powerful tools to study cis-regulatory patterns at the single-cell resolution. An

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors thank Tatyana Merkulova and Pavel Borodin for fruitful discussions and anonymous reviewers for valuable advice. This work was supported by the Russian Science Foundation, grant no. 20-14-00140.

References (70)

  • J. Szymański et al.

    Analysis of wild tomato introgression lines elucidates the genetic basis of transcriptome and metabolome variation underlying fruit traits and pathogen response

    Nat Genet

    (2020)
  • M. Zander et al.

    Integrated multi-omics framework of the plant response to jasmonic acid

    Nat Plants

    (2020)
  • M. Muthamilarasan et al.

    Multi-omics approaches for strategic improvement of stress tolerance in underutilized crop species: a climate change perspective

    Adv Genet

    (2019)
  • X. Lai et al.

    Building transcription factor binding site models to understand gene regulation in plants

    Mol Plant

    (2019)
  • Y. Nie et al.

    Cooperative binding of transcription factors in the human genome

    Genomics

    (2020)
  • E. Morgunova et al.

    Structural perspective of cooperative transcription factor binding

    Curr Opin Struct Biol

    (2017)
  • D. Shi et al.

    Tissue-specific transcriptome profiling of the Arabidopsis inflorescence stem reveals local cellular signatures

    Plant Cell

    (2020)
  • Y. Guo et al.

    A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction

    Genome Res

    (2018)
  • M. Siebert et al.

    Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences

    Nucleic Acids Res

    (2016)
  • S. Ruan et al.

    BEESEM: estimation of binding energy models using HT-SELEX data

    Bioinformatics

    (2017)
  • M.A.H. Samee et al.

    A De Novo Shape motif discovery algorithm reveals preferences of transcription factors for DNA shape beyond sequence motifs

    Cell Syst

    (2019)
  • S. Heinz et al.

    Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities

    Mol Cell

    (2010)
  • J. Sloan et al.

    Structural basis for the complex DNA binding behavior of the plant stem cell regulator WUSCHEL

    Nat Commun

    (2020)
  • A. Freire-Rios et al.

    Architecture of DNA elements mediating ARF transcription factor binding and auxin-responsive gene expression in Arabidopsis

    Proc Natl Acad Sci U S A

    (2020)
  • B.A. Krizek et al.

    The Arabidopsis transcription factor AINTEGUMENTA orchestrates patterning genes and auxin signaling in the establishment of floral growth and form

    Plant J

    (2020)
  • V. Levitsky et al.

    A single ChIP-seq dataset is sufficient for comprehensive analysis of motifs co-occurrence with MCOT package

    Nucleic Acids Res

    (2019)
  • M. Galli et al.

    The DNA binding landscape of the maize AUXIN RESPONSE FACTOR family

    Nat Commun

    (2018)
  • T. Whitington et al.

    Inferring transcription factor complexes from ChIP-seq data

    Nucleic Acids Res

    (2011)
  • V.V. Mironova et al.

    Computational analysis of auxin responsive elements in the Arabidopsis thaliana L. genome

    BMC Genom

    (2014)
  • M.E. Smit et al.

    Specification and regulation of vascular tissue identity in the Arabidopsis embryo

    Development

    (2020)
  • A. Jolma et al.

    DNA-binding specificities of human transcription factors

    Cell

    (2013)
  • F.X. Wang et al.

    Chromatin accessibility dynamics and a hierarchical transcriptional regulatory network structure for plant somatic embryogenesis

    Dev Cell

    (2020)
  • S.E. Schauer et al.

    Intronic regulatory elements determine the divergent expression patterns of AGAMOUS-LIKE6 subfamily members in Arabidopsis

    Plant J

    (2009)
  • A. Serrano-Mislata et al.

    Separate elements of the TERMINAL FLOWER 1 cis-regulatory region integrate pathways to control flowering time and shoot meristem identity

    Development

    (2016)
  • Y. Zhu et al.

    TERMINAL FLOWER 1-FD complex target genes and competition with FLOWERING LOCUS T

    Nat Commun

    (2020)
  • Cited by (0)

    View full text