The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization

https://doi.org/10.1016/j.jsb.2020.107475Get rights and content

Abstract

Prediction of RNA base pairings yields insight into molecular structure, and therefore function. The most common methods predict an optimal structure under the standard thermodynamic model. One component of this model is the equation which governs the cost of branching, where three or more helical “arms” radiate out from a multiloop (also known as a junction). The multiloop initiation equation has three parameters; changing those values can significantly alter the predicted structure. We give a complete analysis of the prediction accuracy, stability, and robustness for all possible parameter combinations for a diverse set of tRNA sequences, and also for 5S rRNA. We find that the accuracy can often be substantially improved on a per sequence basis. However, simultaneous improvement within families, and most especially between families, remains a challenge.

Introduction

Knowing the intra-sequence base pairings of an RNA molecule is typically a crucial step in understanding its function (Tinoco and Bustamante, 2000, Doudna, 2000). Towards this end, thermodynamic optimization prediction methods remain essential tools for RNA structural biology (Mathews and Turner, 2006), even as the ribonomics field moves forward (Schuster et al., 1997, Major and Griffey, 2001, Gardner and Giegerich, 2004, Ding, 2006, Leontis et al., 2006, Mathews, 2006, Shapiro et al., 2007, Flamm and Hofacker, 2008, Eddy, 2014).

A set of pseudoknot-free, canonical base pairs for a single-stranded RNA sequence is called a secondary structure. Each base pair defines a substructure, such as a hairpin loop or a base pair stack. Our interest here are the substructures known as multiloops (or junctions), which have three or more helical “arms” branching off. The canonical example for such a multiloop is the central single-stranded region in a 4-armed tRNA secondary structure. Multiloops determine the molecular shape (Giegerich et al., 2004) yet are some of the most difficult substructures to predict correctly (Doshi et al., 2004).

The most common prediction methods use dynamic programing to efficiently generate a minimum free energy (MFE) structure as output (Zuker, 2003, Markham and Zuker, 2008, Gruber et al., 2008, Reuter and Mathews, 2010). The free energy change from the unpaired RNA sequence is approximated under the nearest neighbor thermodynamic model (NNTM). The model, and associated parameters, are available online through the Nearest Neighbor Database (NNDB) (Turner and Mathews, 2010). The ΔG of a secondary structure is the sum of its substructure NNTM values. Here we analyze the initiation score, intended to approximate the entropic penalty, given to a multiloop.

Multiloop stability under the NNTM is the sum of two types of free energy changes. There is an initiation term (generally unfavorable) and then the various (favorable) values for the “stacking” of adjacent single-stranded nucleotides on base pairs in the loop. The stacking energies are based on experimental measurements (Jaeger et al., 1989, Mathews et al., 1999), but the initation is a linear function, originally chosen (Jaeger et al., 1989) for computational expediency, in three (learned) parameters;ΔGinit=a+b·[number of unpaired nucleotides]+c·[number of branching helices].Previously, this simple entropy approximation was viewed with some concern (Diamond et al., 2001, Mathews and Turner, 2002, Lu et al., 2006), but recent results (Ward et al., 2017) demonstrate that it outperforms more complicated models in MFE prediction accuracy.

To achieve the full potential of this linear model for multiloop initiation, we should understand how MFE predictions depend on the (a,b,c) parameters. This is possible by applying mathematical theory to compute and analyze “RNA branching polytopes.” In this way, we can characterize the optimal branching of a given RNA sequence for every possible combination of (a,b,c). This approach, called a parametric analysis, permits us to quantify how much the accuracy can be improved, as well as other important characteristics like its stability and robustness.

We find that, on a per sequence basis, the accuracy can often be improved by a substantial amount, especially when it was originally low. However, the best predictions may require significantly different combinations of parameters. Hence, improving the average accuracy over a diverse set of sequences for a given RNA family, like tRNA or 5S rRNA, is much more challenging—but still possible.

However, our current approach cannot simultaneouly achieve this improvement for both the tRNA and 5S rRNA families tested. This result highlights that, while the linear model for multiloop initiation in Eq. (1) can achieve very good accuracy, there may be a fundamental limit to possible improvements for MFE branching predictions.

Section snippets

Materials and methods

We investigate how MFE prediction under the NNTM depends on multiloop initiation parameters. In our analysis, we vary the parameters (a,b,c) to characterize how the optimal branching changes, and its effect on important prediction characteristics.

As listed in Table 1, each major revision of the NNTM has changed the multiloop initiation parameters. The original “Turner89” parameters (Jaeger et al., 1989) are now no longer commonly used, but included here for completeness. The Turner99 ones (

Results and discussion

We address first the biological implications of our analysis, and defer the geometric details until later.

Conclusion

In this work we analyzed the effects of changing the three parameters (a,b,c) used in the initiation score, which approximates the entropic penalty, given to a multiloop in the NNTM. For this purpose we leveraged tools from geometry that allow us to build so-called branching polytopes for a diverse set of tRNA and 5S rRNA sequences and analyze all possible MFE structures for each of them. We then used this comprehensive information to give a complete analysis of the prediction accuracy,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by funds from the National Science Foundation (DMS 1815832 to SP and DMS 1815044 to CEH).

References (34)

  • P. Clote et al.

    Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency

    Bioinformatics

    (2005)
  • J.M. Diamond et al.

    Thermodynamics of three-way multibranch loops in RNA

    Biochemistry

    (2001)
  • Y. Ding

    Statistical and Bayesian approaches to RNA secondary structure prediction

    RNA

    (2006)
  • K.J. Doshi et al.

    Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction

    BMC Bioinf.

    (2004)
  • J.A. Doudna

    Structural genomics of RNA

    Nat. Struct. Biol.

    (2000)
  • E. Drellich et al.

    Algebraic and Geometric Methods in Applied Discrete Mathematics, ch. Geometric combinatorics and computational molecular biology: branching polytopes for RNA sequences

    AMS Contemp. Math.

    (2017)
  • S.R. Eddy

    Computational analysis of conserved RNA secondary structure in transcriptomes and genomes

    Annu. Rev. Biophys.

    (2014)
  • View full text