Computational approaches from polymer physics to investigate chromatin folding

https://doi.org/10.1016/j.ceb.2020.01.002Get rights and content

Abstract

Microscopy and sequencing-based technologies are providing increasing insights into chromatin architecture. Nevertheless, a full comprehension of chromosome folding and its link with vital cell functions is far from accomplished at the molecular level. Recent theoretical and computational approaches are providing important support to experiments to dissect the three-dimensional structure of chromosomes and its organizational mechanisms. Here, we review, in particular, the String&Binders polymer model of chromatin that describes the textbook scenario where contacts between distal DNA sites are established by cognate binders. It has been shown to recapitulate key features of chromosome folding and to be able at predicting how phenotypes causing structural variants rewire the interactions between genes and regulators.

Introduction

The genome of higher eukaryotes is folded into the cell nucleus in complex three-dimensional (3D) conformations linked to essential biological functions, such as gene regulation and DNA replication, and its abnormal folding has been associated to congenital diseases [1,2]. Advancements in experimental technologies are providing a wealth of information about the architecture of the genome (Figure 1); however, the understanding of the molecular mechanisms involved remains a challenging question that is just starting to be answered. In the last decade, novel sequencing-based techniques have triggered the detailed exploration of how chromosomes fold at genomic scales [3, 4, 5], revealing that chromatin has a complex structure, well beyond the nucleosome scale [3], including different levels of compartmentalization from the sub Mb-up to chromosomal scales, encompassing A/B compartments, topologically associated domains (TADs), meta-TADs, and specific patterns as loops and stripes [3,6,7]. Hypotheses have been made about how those structures relate to function; for instance, TADs are thought to favor proper gene–enhancer contacts. In addition, it has emerged that chromosome folding is highly dynamic, variable through cell cycle [8,9], differentiation [7,10], and across single cells in a population [11].

To make sense of this increasingly large and complex amount of data, first-principled models have been developed that try to explain the variety of chromatin contact patterns in a coherent, mechanistic framework [12, 13, 14, ∗∗15, ∗16, 17, 18, 19, 20, 21, 22, 23, 24]. In parallel, data-driven approaches try to answer the question on how the genome structure appears in 3D and how it evolves in time, at different scales, simply by computational inference methods from experimental data [12,25, 26, 27]. All those approaches, once properly validated against data, allow us to predict aspects of 3D folding that are not yet accessible from experiments and have proved to be an important tool to investigate chromatin folding (Figure 1). Here, we review some of the more recent advances from the point of view of polymer physics modeling, focusing on principled approaches, in particular, on the Strings&Binders (SBS) model of chromatin. We illustrate via a number of examples the predictive power and range of applications of the model and try to highlight its advantages and limitations with respect to other approaches.

Models from basic polymer physics have been introduced, in particular, to try to identify the mechanisms of chromosome compartmentalization and pattern formation in a principled way [12,13,17].

One class of models explain chromatin folding as driven thermodynamically by homotypic interactions between DNA sites sharing compatible chromatin marks [15,18, 19, 20, 21,24,28]. Cognate site interactions can realistically take place through the mediation of proteins binding to multiple sites and possibly to each other, which can induce phase separation of chromatin subcompartments [29]. Another possibility is the association of chromatin sites with nuclear anchors such as the nuclear lamina and nuclear speckles. Indeed, evidence supports the role of a number of bridging proteins in shaping chromatin architecture, including active and poised RNA polymerase II (Pol-II), polycomb repressive complex 1 (PRC1), and the transcription factor Yin Yang 1 (YY1) [30, 31, 32, 33] and some of them, such as transcription factors, Pol-II, mediator, and HP1 have been recently observed to form phase-separated condensates [34, 35, 36, ∗∗37, ∗∗38, ∗∗39].

As an example of that class of models, here we focus on the SBS model, one of the first introduced [18] and shown to recapitulate high-throughput chromosome conformation capture (Hi-C), genome architecture mapping (GAM), and microscopy data across chromosomal scales and cell types [7,15,19,24,30]. The SBS model describes a chromatin filament as a self-avoiding walk string of beads, along which specific beads function as binding sites for cognate diffusing binders that can bridge them, thus driving folding (Figure 2a). Different types of binding sites (colors) can selectively interact only with their specific cognate binders. The SBS model can be computationally investigated, for example, by molecular dynamics (MD) simulations, where beads and binders are subject to Langevin dynamics with classical interaction potentials [24]. The attractive interaction energy, Eint, between binding sites and cognate binders and the binder concentration, c, are key parameters of the model. SBS models are particularly suitable to explain the formation of chromatin compartments. MD simulations of a simple, toy SBS block copolymer with alternation of two types of binding sites (Figure 2b) illustrate the concept: starting from a self-avoiding walk configuration, with Eint and c values above a phase transition threshold, beads in the same blocks start to compartmentalize favored by genomic proximity; subsequently long-range contacts are established between beads in different blocks lead to microphase segregation of same type blocks and the formation of the well-known checkerboard pattern in the average contact map [24].

By considering more complex polymer models, it is possible to describe finer Hi-C features and obtain highly realistic descriptions of chromatin loci folding. To this aim, one strategy is to specialize the polymer beads by use of known chromatin binding sites, such as CCCTC-binding factors (CTCFs), and epigenetic marks, such as histone modifications. Following this approach, the 3D folding of a number of genomic loci, such as HoxB, SOX2, and Pax6, have been recapitulated in mouse and human cell types, at a population and single-cell level [16,30]. An alternative strategy is to use experimental contact data, such as Hi-C, to infer the polymer binding sites. Although the first approach has the advantage that epigenetic data are generally more available across organisms and cell types, the second has the possibility to deduce 3D structures without any prior biological knowledge of chromatin binding sites and the potential to discover unknown factors involved in chromosome organization. Following this strategy, a computational inference method, PRISMR (polymer-based recursive statistical inference), has been recently developed to obtain the minimal, best SBS polymer model for any specific loci of interest. PRISMR takes as its only input a pairwise contact map of the locus of interest, for example, Hi-C, and based on a Monte Carlo procedure, it returns the minimal number of different binding sites types (colors) and their location along the polymer chain, needed to explain the input data within a given accuracy [15]. This approach was successfully used to describe the folding of a number of loci in mouse and human cell types including Sox9, EPHA4, Pitx1, Shh, and HoxD [15,24,∗40, 41, 42]. Those loci have a crucial role in development, and their mis-folding have been linked to diseases [2,43]. Their Hi-C contact patterns are well reproduced by SBS models from below to above TAD scale, as shown, for example, in Figure 3a for the EPHA4 locus in human fibroblasts. To reproduce their folding, more than two types of binding sites (as in the toy block copolymer example) were needed. Interestingly, they were shown to correlate with different combinations of chromatin marks, including histone modifications, CTCF sites, and transcription factors. The model derived conformations allow to map Hi-C 2D patterns in 3D space (Figure 3a) and to make predictions on other aspects of folding beyond them. SBS model predictions were largely validated against independent 4C, fluorescence in situ hybridization (FISH), and multiway 4C data in wild-type (WT) and mutant loci [15,40,42].

In particular, the EPHA4 locus, where different structural variants (SVs) were shown to lead to different limb malformations in human, the study by Lupiáñez et al [44] was taken as a case study to demonstrate the ability of the SBS model to predict the effects of pathogenic SVs on chromatin folding. To this aim, SVs were implemented in silico on the SBS polymer model inferred by PRISMR from the WT EPHA4 locus (Figure 3a), and MD simulations were rerun to derive their average contact map. As an exemplificative case, in Figure 3b results are shown for a 1.4 Mb heterozygous duplication (DupF) leading to syndactyly. The model predicted the formation of specific patterns of ectopic contacts in each mutant and predictions were successfully tested by independent cHi-C performed on fibroblast cells from human patients carrying the SVs (Figure 3b). Importantly, the predicted ectopic patterns help to interpret the SVs from a functional point of view, in terms of rewiring of promoter–enhancer contacts. For example, in case of DupF duplication, as highlighted by the virtual 4C tracks in Figure 3c, ectopic interactions were predicted between the EPHA4 enhancers and the WNT6 gene promoter that resulted in the observed WNT6 ectopic activation and disease in patients carrying the mutation [15,44]. The model 3D structures further help to visualize and interpret the impact of SVs on folding (Figure 3a and b). The PRISMR procedure is very general and can be applied to any genomic rearrangement to predict interactions and analyze its disease-causing potential, without performing extensive Hi-C experiments. Another important case of application of the PRISMR approach was the analysis of structural changes of chromatin during cell differentiation and their link with function [40,42]. For example, applied to Pitx1 locus in murine forelimb and hindlimb tissues, it allowed to link the difference of Pitx1 gene expression state (active and inactive, respectively) to the locus tissue-specific spatial conformation, which in hindlimb can assist and in forelimb prevent enhancer–promoter interactions (Figure 3d and e). As a further confirmation of the functional role of 3D conformation, a genomic inversion in forelimb leading to a partial arm-to-leg transformation and Pitx1 ectopic activation resulted in a hindlimb-like 3D structure [40] (Figure 3d and e).

Although equilibrium models, such as the SBS model, have been proven to explain a large number of observed folding features, they have some important limitations. For example, the observation that loops between CTCF sites, found at a fraction of TAD boundaries, are formed almost exclusively between convergent motifs (CTCF convergence bias) [6] can be explained only with additional ad-hoc hypotheses.

A different class of models that can incorporate those aspects traces chromatin folding back to a loop extrusion (LE) mechanism. In those models, loop-extruding factors, such as condensin and cohesin, handcuff two sites on the DNA chain and sliding along it actively extrudes chromatin loops up to reach blocking factors, such as CTCF-binding sites of opposite orientation [22,23]. Variants of the LE model have been developed that consider extrusion without an active, energy burning motor but driven by thermal diffusion or transcription-induced supercoiling [14,45]. Coarse-grained MD simulations of polymers subject to LE, by assumption of appropriate parameters (such as processivity and density of extruding factors), well reproduce the formation of TADs, loops, and stripes in the simulated contact maps [22,23] and can explain the effects on chromatin organization of disruption of specific CTCF-binding sites [22]. Furthermore, LE models have been shown to recapitulate features of chromosome folding during its mitotic phase [9]. However, it has been shown that CTCF inactivation make TAD boundaries appear fuzzier but do not abolish them [46] and that both cohesin and CTCF depletion do not affect much gene expression [47,48]. In addition, the effects of some large genomic rearrangements, such as certain deletions, are clearly CTCF independent [43]. It is also difficult for LE models to explain recent observations of regulatory hubs of simultaneous many-body interaction between multiple genes and enhancers [11,49], a feature easily comprised in SBS-like models (see e.g. Ref. [42]).

Interestingly, recent experiments showed that upon depletion of cohesin components, chromatin compartments are essentially preserved, as expected by a homotypic interaction mechanism, whereas TADs are largely disrupted, as expected by LE [47,48,50,51], even though the effects vary by study, likely due to depletion efficiency of auxin-inducible degrons versus other approaches. This kind of observations led to the convincing hypothesis that the two mechanisms act together to shape chromatin organization [16,50,52,53], although their precise interplay, for example, if they are competitive or cooperative, and range of action remain to be investigated. Consistent with a competitive model, for example, approaches disrupting cohesin removal proteins (wings apart-like protein homolog (WAPL) and precocious dissociation of sisters protein 5 (PDS5)) led to weaker A/B compartments [46,54], whereas disruption of the cohesin loading factor NIPBL led to stronger compartments [50]. A recent study [51] analyzing the effects of ATP depletion, instead, showed that ATP is required for cohesin sliding from loading to anchor sites and for loop domain formation but not for loop domain and compartment maintenance. Even more intriguingly, recent super-resolution imaging showed that TAD boundaries are highly variable in single cells and that cohesin depletion does not affect TAD formation in single cells but only at the population average level, thus suggesting that cohesin plays a role in setting TAD boundary positions, but it is not necessary for their formation [11]. Those experimental results contradict, in the studied loci, a model of CTCF–cohesin-based loop extrusion [14,22,23] for the formation of contacts [11].

Section snippets

Discussion and perspectives

The application of models from polymer physics can help dissecting different mechanisms of chromatin spatial organization. Here, we focused, in particular, on a range of applications of the SBS model, which considers the textbook scenario where chromatin–chromatin homotypic interactions are mediated by diffusing molecular binders. We showed that simulations of the SBS model explain the patterns of chromatin contact maps, such as compartments or TADs, and the details of specific genomic loci,

Author Contributions

Simona Bianco: Writing - Original Draft, Writing - Review & Editing, Visualization, Supervision. Andrea M. Chiariello: Writing - Review & Editing, Supervision. Mattia Conte: Writing - Review & Editing, Visualization. Andrea Esposito: Writing - Review & Editing. Luca Fiorillo: Writing - Review & Editing. Francesco Musella: Writing -Review & Editing. Mario Nicodemi: Writing - Original Draft, Visualization, Supervision, Funding acquisition.

Conflict of interest statement

Nothing declared.

Acknowledgements

MN acknowledges grants from the National Institutes of Health Common Fund 4D Nucleome Program grant (1U54DK107977-01), the EU H2020 Marie Curie ITN (813282), the Einstein BIH Fellowship Award (EVF-BIH-2016-282), CINECA ISCRA (HP10CRTY8P), Regione Campania SATIN Project 2018–2020, and computer resources from the INFN, CINECA, ENEA CRESCO/ENEAGRID, and SCoPE/ReCaS at the University of Naples.

References (56)

  • R.A. Beagrie et al.

    Complex multi-enhancer contacts captured by genome architecture mapping

    Nature

    (2017)
  • J. Dekker et al.

    The 4D nucleome project

    Nature

    (2017)
  • S.S.P. Rao et al.

    A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping

    Cell

    (2014)
  • T. Nagano et al.

    Capturing three-dimensional genome organization in individual cells by single-cell Hi-C

  • J.H. Gibcus et al.

    A pathway for mitotic chromosome formation

    Science

    (2018)
  • B. Bonev et al.

    Multiscale 3D genome rewiring during mouse neural development

    Cell

    (2017)
  • B. Bintu et al.

    Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells

    Science

    (2018)
  • J.J. Parmar et al.

    How the genome folds: the biophysics of four-dimensional chromatin organization

    Annu Rev Biophys

    (2019)
  • C.A. Brackley et al.

    Nonequilibrium chromosome looping via molecular slip links

    Phys Rev Lett

    (2017)
  • S. Bianco et al.

    Polymer physics predicts the effects of structural variants on chromatin architecture

    Nat Genet

    (2018)
  • A. Buckle et al.

    Polymer simulations of heteromorphic chromatin predict the 3D folding of complex genomic loci

    Mol Cell

    (2018)
  • M. Nicodemi et al.

    Models of chromosome structure

    Curr Opin Cell Biol

    (2014)
  • M. Nicodemi et al.

    Thermodynamic pathways to genome spatial organization in the cell nucleus

    Biophys J

    (2009)
  • M. Barbieri et al.

    Complexity of chromatin folding is captured by the strings and binders switch model

    Proc Natl Acad Sci

    (2012)
  • C.A. Brackley et al.

    Nonspecific bridging-induced attraction drives clustering of DNA-binding proteins and genome organization

    Proc Natl Acad Sci

    (2013)
  • D. Jost et al.

    Modeling epigenome folding: formation and dynamics of topologically associated chromatin domains

    Nucleic Acids Res

    (2014)
  • A.L. Sanborn et al.

    Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes

    Proc Natl Acad Sci

    (2015)
  • A.M. Chiariello et al.

    Polymer physics of chromosome large-scale 3D organisation

    Sci Rep

    (2016)
  • Cited by (0)

    View full text