The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization
Introduction
Knowing the intra-sequence base pairings of an RNA molecule is typically a crucial step in understanding its function (Tinoco and Bustamante, 2000, Doudna, 2000). Towards this end, thermodynamic optimization prediction methods remain essential tools for RNA structural biology (Mathews and Turner, 2006), even as the ribonomics field moves forward (Schuster et al., 1997, Major and Griffey, 2001, Gardner and Giegerich, 2004, Ding, 2006, Leontis et al., 2006, Mathews, 2006, Shapiro et al., 2007, Flamm and Hofacker, 2008, Eddy, 2014).
A set of pseudoknot-free, canonical base pairs for a single-stranded RNA sequence is called a secondary structure. Each base pair defines a substructure, such as a hairpin loop or a base pair stack. Our interest here are the substructures known as multiloops (or junctions), which have three or more helical “arms” branching off. The canonical example for such a multiloop is the central single-stranded region in a 4-armed tRNA secondary structure. Multiloops determine the molecular shape (Giegerich et al., 2004) yet are some of the most difficult substructures to predict correctly (Doshi et al., 2004).
The most common prediction methods use dynamic programing to efficiently generate a minimum free energy (MFE) structure as output (Zuker, 2003, Markham and Zuker, 2008, Gruber et al., 2008, Reuter and Mathews, 2010). The free energy change from the unpaired RNA sequence is approximated under the nearest neighbor thermodynamic model (NNTM). The model, and associated parameters, are available online through the Nearest Neighbor Database (NNDB) (Turner and Mathews, 2010). The of a secondary structure is the sum of its substructure NNTM values. Here we analyze the initiation score, intended to approximate the entropic penalty, given to a multiloop.
Multiloop stability under the NNTM is the sum of two types of free energy changes. There is an initiation term (generally unfavorable) and then the various (favorable) values for the “stacking” of adjacent single-stranded nucleotides on base pairs in the loop. The stacking energies are based on experimental measurements (Jaeger et al., 1989, Mathews et al., 1999), but the initation is a linear function, originally chosen (Jaeger et al., 1989) for computational expediency, in three (learned) parameters;Previously, this simple entropy approximation was viewed with some concern (Diamond et al., 2001, Mathews and Turner, 2002, Lu et al., 2006), but recent results (Ward et al., 2017) demonstrate that it outperforms more complicated models in MFE prediction accuracy.
To achieve the full potential of this linear model for multiloop initiation, we should understand how MFE predictions depend on the parameters. This is possible by applying mathematical theory to compute and analyze “RNA branching polytopes.” In this way, we can characterize the optimal branching of a given RNA sequence for every possible combination of . This approach, called a parametric analysis, permits us to quantify how much the accuracy can be improved, as well as other important characteristics like its stability and robustness.
We find that, on a per sequence basis, the accuracy can often be improved by a substantial amount, especially when it was originally low. However, the best predictions may require significantly different combinations of parameters. Hence, improving the average accuracy over a diverse set of sequences for a given RNA family, like tRNA or 5S rRNA, is much more challenging—but still possible.
However, our current approach cannot simultaneouly achieve this improvement for both the tRNA and 5S rRNA families tested. This result highlights that, while the linear model for multiloop initiation in Eq. (1) can achieve very good accuracy, there may be a fundamental limit to possible improvements for MFE branching predictions.
Section snippets
Materials and methods
We investigate how MFE prediction under the NNTM depends on multiloop initiation parameters. In our analysis, we vary the parameters to characterize how the optimal branching changes, and its effect on important prediction characteristics.
As listed in Table 1, each major revision of the NNTM has changed the multiloop initiation parameters. The original “Turner89” parameters (Jaeger et al., 1989) are now no longer commonly used, but included here for completeness. The Turner99 ones (
Results and discussion
We address first the biological implications of our analysis, and defer the geometric details until later.
Conclusion
In this work we analyzed the effects of changing the three parameters used in the initiation score, which approximates the entropic penalty, given to a multiloop in the NNTM. For this purpose we leveraged tools from geometry that allow us to build so-called branching polytopes for a diverse set of tRNA and 5S rRNA sequences and analyze all possible MFE structures for each of them. We then used this comprehensive information to give a complete analysis of the prediction accuracy,
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was supported by funds from the National Science Foundation (DMS 1815832 to SP and DMS 1815044 to CEH).
References (34)
- et al.
The building blocks and motifs of RNA architecture
Curr. Opin. Struct. Biol.
(2006) - et al.
Computational methods for RNA structure determination
Curr. Opin. Struct. Biol.
(2001) Revolutions in RNA secondary structure prediction
J. Mol. Biol.
(2006)- et al.
Prediction of RNA secondary structure by free energy minimization
Curr. Opin. Struct. Biol.
(2006) - et al.
Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure
J. Mol. Biol.
(1999) - et al.
RNA structures and folding: from conventional to new issues in structure predictions
Curr. Opin. Struct. Biol.
(1997) - et al.
Bridging the gap in RNA structure prediction
Curr. Opin. Struct. Biol.
(2007) - et al.
How RNA folds
J. Mol. Biol.
(1999) - et al.
On the structure of RNA branching polytopes
SIAM J. Appl. Algebra Geometry
(2018) - et al.
The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs
BMC Bioinf.
(2002)
Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency
Bioinformatics
Thermodynamics of three-way multibranch loops in RNA
Biochemistry
Statistical and Bayesian approaches to RNA secondary structure prediction
RNA
Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction
BMC Bioinf.
Structural genomics of RNA
Nat. Struct. Biol.
Algebraic and Geometric Methods in Applied Discrete Mathematics, ch. Geometric combinatorics and computational molecular biology: branching polytopes for RNA sequences
AMS Contemp. Math.
Computational analysis of conserved RNA secondary structure in transcriptomes and genomes
Annu. Rev. Biophys.
Cited by (7)
DERNA Enables Pareto Optimal RNA Design
2024, Journal of Computational BiologyStemP: A Fast and Deterministic Stem-Graph Approach for RNA Secondary Structure Prediction
2023, IEEE/ACM Transactions on Computational Biology and BioinformaticsScaling properties of RNA as a randomly branching polymer
2023, Journal of Chemical PhysicsViral RNA as a branched polymer
2022, arXiv