The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization

doi:10.1016/j.jsb.2020.107475

Journal of Structural Biology

Volume 210, Issue 1, 1 April 2020, 107475

https://doi.org/10.1016/j.jsb.2020.107475 Get rights and content

Abstract

Prediction of RNA base pairings yields insight into molecular structure, and therefore function. The most common methods predict an optimal structure under the standard thermodynamic model. One component of this model is the equation which governs the cost of branching, where three or more helical “arms” radiate out from a multiloop (also known as a junction). The multiloop initiation equation has three parameters; changing those values can significantly alter the predicted structure. We give a complete analysis of the prediction accuracy, stability, and robustness for all possible parameter combinations for a diverse set of tRNA sequences, and also for 5S rRNA. We find that the accuracy can often be substantially improved on a per sequence basis. However, simultaneous improvement within families, and most especially between families, remains a challenge.

Introduction

Knowing the intra-sequence base pairings of an RNA molecule is typically a crucial step in understanding its function (Tinoco and Bustamante, 2000, Doudna, 2000). Towards this end, thermodynamic optimization prediction methods remain essential tools for RNA structural biology (Mathews and Turner, 2006), even as the ribonomics field moves forward (Schuster et al., 1997, Major and Griffey, 2001, Gardner and Giegerich, 2004, Ding, 2006, Leontis et al., 2006, Mathews, 2006, Shapiro et al., 2007, Flamm and Hofacker, 2008, Eddy, 2014).

A set of pseudoknot-free, canonical base pairs for a single-stranded RNA sequence is called a secondary structure. Each base pair defines a substructure, such as a hairpin loop or a base pair stack. Our interest here are the substructures known as multiloops (or junctions), which have three or more helical “arms” branching off. The canonical example for such a multiloop is the central single-stranded region in a 4-armed tRNA secondary structure. Multiloops determine the molecular shape (Giegerich et al., 2004) yet are some of the most difficult substructures to predict correctly (Doshi et al., 2004).

The most common prediction methods use dynamic programing to efficiently generate a minimum free energy (MFE) structure as output (Zuker, 2003, Markham and Zuker, 2008, Gruber et al., 2008, Reuter and Mathews, 2010). The free energy change from the unpaired RNA sequence is approximated under the nearest neighbor thermodynamic model (NNTM). The model, and associated parameters, are available online through the Nearest Neighbor Database (NNDB) (Turner and Mathews, 2010). The $Δ G$ of a secondary structure is the sum of its substructure NNTM values. Here we analyze the initiation score, intended to approximate the entropic penalty, given to a multiloop.

Multiloop stability under the NNTM is the sum of two types of free energy changes. There is an initiation term (generally unfavorable) and then the various (favorable) values for the “stacking” of adjacent single-stranded nucleotides on base pairs in the loop. The stacking energies are based on experimental measurements (Jaeger et al., 1989, Mathews et al., 1999), but the initation is a linear function, originally chosen (Jaeger et al., 1989) for computational expediency, in three (learned) parameters; $\begin{matrix} Δ G_{init} = a & + b \cdot [number of unpaired nucleotides] \\ + c \cdot [number of branching helices] . \end{matrix}$ Previously, this simple entropy approximation was viewed with some concern (Diamond et al., 2001, Mathews and Turner, 2002, Lu et al., 2006), but recent results (Ward et al., 2017) demonstrate that it outperforms more complicated models in MFE prediction accuracy.

To achieve the full potential of this linear model for multiloop initiation, we should understand how MFE predictions depend on the $(a, b, c)$ parameters. This is possible by applying mathematical theory to compute and analyze “RNA branching polytopes.” In this way, we can characterize the optimal branching of a given RNA sequence for every possible combination of $(a, b, c)$ . This approach, called a parametric analysis, permits us to quantify how much the accuracy can be improved, as well as other important characteristics like its stability and robustness.

We find that, on a per sequence basis, the accuracy can often be improved by a substantial amount, especially when it was originally low. However, the best predictions may require significantly different combinations of parameters. Hence, improving the average accuracy over a diverse set of sequences for a given RNA family, like tRNA or 5S rRNA, is much more challenging—but still possible.

However, our current approach cannot simultaneouly achieve this improvement for both the tRNA and 5S rRNA families tested. This result highlights that, while the linear model for multiloop initiation in Eq. (1) can achieve very good accuracy, there may be a fundamental limit to possible improvements for MFE branching predictions.

Section snippets

Materials and methods

We investigate how MFE prediction under the NNTM depends on multiloop initiation parameters. In our analysis, we vary the parameters $(a, b, c)$ to characterize how the optimal branching changes, and its effect on important prediction characteristics.

As listed in Table 1, each major revision of the NNTM has changed the multiloop initiation parameters. The original “Turner89” parameters (Jaeger et al., 1989) are now no longer commonly used, but included here for completeness. The Turner99 ones (

Results and discussion

We address first the biological implications of our analysis, and defer the geometric details until later.

Conclusion

In this work we analyzed the effects of changing the three parameters $(a, b, c)$ used in the initiation score, which approximates the entropic penalty, given to a multiloop in the NNTM. For this purpose we leveraged tools from geometry that allow us to build so-called branching polytopes for a diverse set of tRNA and 5S rRNA sequences and analyze all possible MFE structures for each of them. We then used this comprehensive information to give a complete analysis of the prediction accuracy,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was supported by funds from the National Science Foundation (DMS 1815832 to SP and DMS 1815044 to CEH).

References (34)

N.B. Leontis et al.
The building blocks and motifs of RNA architecture
Curr. Opin. Struct. Biol.
(2006)
F. Major et al.
Computational methods for RNA structure determination
Curr. Opin. Struct. Biol.
(2001)
D.H. Mathews
Revolutions in RNA secondary structure prediction
J. Mol. Biol.
(2006)
D.H. Mathews et al.
Prediction of RNA secondary structure by free energy minimization
Curr. Opin. Struct. Biol.
(2006)
D.H. Mathews et al.
Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure
J. Mol. Biol.
(1999)
P. Schuster et al.
RNA structures and folding: from conventional to new issues in structure predictions
Curr. Opin. Struct. Biol.
(1997)
B.A. Shapiro et al.
Bridging the gap in RNA structure prediction
Curr. Opin. Struct. Biol.
(2007)
I. Tinoco et al.
How RNA folds
J. Mol. Biol.
(1999)
F. Barrera-Cruz et al.
On the structure of RNA branching polytopes
SIAM J. Appl. Algebra Geometry
(2018)
J.J. Cannone et al.
The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs
BMC Bioinf.
(2002)

P. Clote et al.

Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency

Bioinformatics

(2005)

J.M. Diamond et al.

Thermodynamics of three-way multibranch loops in RNA

Biochemistry

(2001)

Y. Ding

Statistical and Bayesian approaches to RNA secondary structure prediction

RNA

(2006)

K.J. Doshi et al.

Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction

BMC Bioinf.

(2004)

J.A. Doudna

Structural genomics of RNA

Nat. Struct. Biol.

(2000)

E. Drellich et al.

Algebraic and Geometric Methods in Applied Discrete Mathematics, ch. Geometric combinatorics and computational molecular biology: branching polytopes for RNA sequences

AMS Contemp. Math.

(2017)

S.R. Eddy

Computational analysis of conserved RNA secondary structure in transcriptomes and genomes

Annu. Rev. Biophys.

(2014)

Cited by (7)

DERNA Enables Pareto Optimal RNA Design
2024, Journal of Computational Biology
StemP: A Fast and Deterministic Stem-Graph Approach for RNA Secondary Structure Prediction
2023, IEEE/ACM Transactions on Computational Biology and Bioinformatics
Scaling properties of RNA as a randomly branching polymer
2023, Journal of Chemical Physics
Scaling properties of RNA as a randomly branching polymer
2023, arXiv
Viral RNA as a branched polymer
2022, arXiv
StemP: A fast and deterministic Stem-graph approach for RNA and protein folding prediction
2022, arXiv

View all citing articles on Scopus

View full text

The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization

Abstract

Introduction

Section snippets

Materials and methods

Results and discussion

Conclusion

Declaration of Competing Interest

Acknowledgements

Curr. Opin. Struct. Biol.

Curr. Opin. Struct. Biol.

J. Mol. Biol.

Curr. Opin. Struct. Biol.

J. Mol. Biol.

Curr. Opin. Struct. Biol.

Curr. Opin. Struct. Biol.

J. Mol. Biol.

On the structure of RNA branching polytopes

SIAM J. Appl. Algebra Geometry

The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs

BMC Bioinf.

Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency

Bioinformatics

Thermodynamics of three-way multibranch loops in RNA

Biochemistry

Statistical and Bayesian approaches to RNA secondary structure prediction

RNA

Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction

BMC Bioinf.

Structural genomics of RNA

Nat. Struct. Biol.

Algebraic and Geometric Methods in Applied Discrete Mathematics, ch. Geometric combinatorics and computational molecular biology: branching polytopes for RNA sequences

AMS Contemp. Math.

Computational analysis of conserved RNA secondary structure in transcriptomes and genomes

Annu. Rev. Biophys.