The article should read as follows:

How Animal miRNAs Structure Influences Their Biogenesis

P. S. Vorozheykina, * and I. I. Titova, b

aNovosibirsk State University, Novosibirsk, 630090 Russia

bInstitute of Cytology and Genetics of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090 Russia

*e-mail: pavel.vorozheykin@gmail.com

Received February 6, 2019; revised March 7, 2019; accepted April 24, 2019

Abstract—MicroRNAs are small non-coding RNAs that are involved in the post-transcriptional regulation of the gene expression in various organisms. This article reviews recent advances in understanding the role of the primary and secondary structures of animal miRNA precursors through the biogenesis stages and the miRNA maturation steps. Also, we describe the effects of genetic variability and heterogeneity of miRNA ends, which play an important role in epitranscriptomics as well as annotation errors in the miRNA databases.

Keywords: miRNA, pre-miRNA, secondary structure, biogenesis, mirtron, single nucleotide polymorphism, mutation, epigenetics

INTRODUCTION

Currently, a large number of small RNAs designed to suppress unwanted genetic material or transcripts have been found in animals. These RNAs are characterized by their short length (20–30 nucleotides) and their association with the Argonaute protein family (AGO and PIWI proteins). Three classes of small RNAs are distinguished—microRNA (miRNA), siRNA (short interfering RNA), and piRNA (piwi-interacting RNA) [1]. The most studied class, miRNA, is characterized by a sequence length of ~22 nucleotides, which are obtained by cleavage of the primary transcript by RNases III Dicer and Drosha [2]. Mature miRNA with one of the AGO protein binds to a target site of mRNA, thus promoting degradation of the mRNA or blocking translation from it [3]. One miRNA can address several different targets, which makes it possible to control the activity of a large number of proteins and biological processes in organism. It is not surprising that the blocking of miRNA genes in animals leads to the appearance of phenotypic changes, as well as to the occurrence of various diseases [4]. Owing to this involvement in many regulatory processes in the cell, miRNAs are rapidly gaining popularity as an object of research (Fig. 1).

Fig. 1.
figure 1

The number of records in the miRBase database and the number of records in the PubMed database, which contains references to miRNAs in the title or annotation.

In this review, we systematize the influence of the primary and secondary miRNA structures on their functions and biogenesis. We also address the problems of genetic and biochemical variability of miRNAs. The presented data will be useful for understanding the organization, regulation, and epigenetics of miRNAs.

Pri-miRNA TRANSCRIPTION

The maturation process of animal miRNAs begins (Fig. 2) with transcription of a long transcript (primary miRNA, pri-miRNA) by RNA polymerase II (or RNA polymerase III for some miRNAs); this transcript contains one or several hairpins of miRNA precursors (pre-miRNAs), m7G-cap, and poly(A) tail [5, 6]. Poly(A) tail may be absent in cases where processing of pri-miRNA by Microprocessor complex begins earlier than the end of transcription [7].

Fig. 2.
figure 2

The main pathways of the miRNA biogenesis in animals. MicroRNA precursors (pri-miRNAs) are transcribed by RNA polymerases II or III. The canonical pathway of biogenesis involves the pri-miRNA processing by the Drosha and Dicer complexes. The noncanonical pathways of biogenesis include Dicer- and Drosha-independent maturation steps of miRNAs.

MicroRNA genes are located in different genomic regions: in introns of noncoding and protein-coding transcripts, exons, or intergenic regions; genes are transcribed independently or as a part of protein-coding host genes [8]. Many miRNA sequences were detected at a short distance from each other (~3–50 kb); some of them form polycistronic transcription units (e.g., miR-100/let-7/miR-125, cluster miR-17/92) [8, 9] and others (e.g. miR-30a/miR-30c-2) do not [10]. Some miRNAs are in both DNA strands and are complementary to each other, for example, miR-3120 and miR-214 [11]. Those miRNAs that are located nearby can be transcribed together. However, they can function post-transcriptionally and be regulated both together and independently through external mechanisms, demonstrating different activities in various tissues and maturation stages of the organism [12, 13]. In addition to external regulation, changes in the one miRNA of the cluster can lead to changes in the expression levels of neighboring miRNAs [14].

Like protein-coding genes, miRNA transcription can be regulated by transcription factors (TF), which enhance or block the pri-miRNA processing [15]. Moreover, when the expression of transcription factors themselves is under the control of miRNAs, regulatory feedback loops are formed [16, 17]. These loops are part of a common gene expression regulation network [18].

In addition to TF, processing of pri-miRNAs may depend on the methylation status of gene promoters, modifications of histones, or modifications of the RNA ends [1921]. Changes in the mRNA nucleotide sequence, such as A>I editing, mutations, or single-nucleotide polymorphisms (SNPs), affect the processing of pri-/pre-/miRNA through transformation of the precursor structure and target affinity [21, 22].

Pri-miRNA PROCESSING IN THE NUCLEUS

After the transcription in the canonical pathway of biogenesis (Fig. 2), animal pri-miRNAs are cleaved by a Microprocessor complex of RNase III Drosha and RNA-binding protein DGCR8 (Pasha D. melanogaster and C. elegans) [3]. The complex binds to the hairpin structure and cuts out a pre-miRNA with a length of ~65–70 nucleotides (nt) at a distance of ~11 nt from single-stranded RNA tails and at a distance of ~22 nt from a terminal loop [23]. The boundaries between single- and double-stranded RNA fragments are signals for the Drosha processing. Each of the RNase III domains in Drosha (RIIIDa and RIIIDb) cuts one of the two hairpin branches in such a way that pre-miRNA with 3'-overhanging ends is obtained [24]. Each cut defines one of the two terminal nucleotides of future miRNA.

The pri-miRNA transformation to a pre-miRNA can be regulated, firstly, by the interaction of proteins with the components of the Microprocessor complex. For example, the p53 protein in cooperation with other proteins (p68, p72, etc.) regulates excision of mir-16, mir-143, mir-145, and other pre-miRNAs [25, 26]. Secondly, the structure of pri-miRNAs, as well as other RNAs, can act as regulators [22, 27]. The following regulatory elements are present in the pri-miRNAs: the terminal loop, oligonucleotide motifs at the base of the hairpin, in the stem, and in the single-stranded ends of the RNA (Fig. 3). One of the most studied elements is the terminal loop; 14% of human pri-miRNAs contain conserved nucleotides in the terminal loop [28]. By binding to the terminal loop, the hnRNP A1 protein can both facilitate the processing of pri-miRNAs (increasing the size of the inner loop, pri-mir-18a) and block it (pri-let-7) [29, 30]. The KSRP protein binds to the G-rich region in the terminal loop and promotes the excision of pre-let-7, pre-mir-196a, pre-mir-21, and other pre-miRNAs from the transcript [31]. Competition of multidirectional factors KSRP and hnRNP A1 for binding to the terminal loop of pre-let-7 determines the level of the miRNA expression [30]. The TDP-43 protein binds to UG-rich terminal loops (but not to the double-stranded RNAs of pre-mir-143 and pre-mir-574) and helps to locate the Microprocessor complex on a pre-miRNA [32]. The YB-1 protein prefers to bind to the UYAUC motif in the terminal loop of human pri-/ pre-mir-29b-2, blocking connection of the Microprocessor and Dicer to the precursor [33]. Part of the Microprocessor complex, DGCR8, binds to the UGUG motif in the terminal loop and facilitates the positioning of the entire complex on the precursor and its cleavage [34, 35]. Protein binding to the terminal loop of pri-miRNA can be part of the regulatory contours: for example, the LIN28 protein and let-7 miRNA regulate each other, forming a negative feedback loop [36].

Fig. 3.
figure 3

Structural features of the canonical miRNA precursor. Typically, pri-miRNA contains two single-stranded ends, a stem with small loops, and a terminal loop. MicroRNAs form a duplex with dinucleotide 3'-overhangs (highlighted in gray). Also shown are oligonucleotide motifs, which can regulate processing and increase the accuracy of the Dicer or Drosha complexes. Arrows show a variation of the cleavage sites.

In addition to the terminal loop, the pri-miRNAs contain other signals for proteins. The R-SMAD proteins promote Drosha processing via the R-SBE motif (CAGAC) in the double-stranded region of the hairpin [37]. The BRCA1 protein binds to the base of the pre-miRNA hairpin and inhibits maturation of the miR-155 miRNA; on the other hand, this binding speeds up the processing of let-7a-1, miR-16-1, miR-145, and miR-34 [38].

The SF2/ASF splicing factors bind to the stem of pri-miRNA and help to process pri-mir-7; the miR-7 miRNA in turn blocks the expression of these factors [39]. Another splicing factor SRp20 (also known as SRSF3) binds to the CNNC motif (where N is any nucleotide from A, U, G, C). The motif is located at a distance of 16–18 nt from the Drosha cleavage site (Fig. 3) and increases the expression level of pre-miRNA [35]. This motif is also required for the p72 RNA helicase, which connects the Microprocessor to other proteins and thereby facilitates the processing [40]. Two other regulatory motifs are found in pri-miRNAs (Fig. 3): UG (~13–14 nt of the 5′-end of the 5′-miRNA) and GHG (where H is A, C, or U; in the double-stranded region ~11 nt to the Drosha cleavage site) [41, 42].

The pri-miRNA motifs listed above (UG and CNNC, GHG and UGUG) help orient the Microprocessor complex relative to the pri-miRNA so that Drosha cleaves near the base of the pre-miRNA hairpin rather than the terminal loop; also they can compensate for small structural defects of the pre-miRNA (small loops and deviation of the stem from the optimal length) [34, 42]. These motifs are widespread: CNNC/UG in the flanking single-stranded RNAs or UGUG in the terminal loop are found in 79% of human pri-miRNAs [41]. Moreover, in mammals, the presence of not all, but only some, motifs is sufficient [43].

RNA-RNA interactions also regulate the maturation of miRNAs. The mature miRNA mmu-miR-709 binds to the single-stranded region near pri-mir-15a/pri-mir-16-1 pri-miRNAs and suppresses their processing [44]. The let-7 pri-miRNA and its mature miRNA are linked by a positive feedback loop when the let-7 miRNA binds to a conserved complementary site at the 3'-end of the pri-miRNA transcript [45].

The Microprocessor complex can directly degrade mRNA: Drosha blocks the expression of the FSTL1 protein by cutting out the pre-mir-198 hairpin in the 3′-untranslated region (UTR) of the protein-coding mRNA [46]. In the 5'-UTR of DGCR8 (Drosha co-factor) mRNA, there is the precursor hairpin pre-mir-1306, which is part of the feedback regulatory loop between Drosha and DGCR8 [47]. Excess the miR-128-3p miRNA reduces the presence of Drosha and Dicer, thereby inhibiting the processing of all miRNAs and leading to lung cancer [48]. Thus, miR-1306/miR-198/miR-128 can be both functional miRNAs and additional cis-regulatory elements.

In general, the Microprocessor complex creates a rich variety of hairpins, some of which are filtered out at the next stages of miRNA maturation, in particular, according to their unsuitable length when transported from the nucleus to the cytoplasm.

TRANSPORT FROM THE NUCLEUS TO CYTOPLASM

After processing in the nucleus, the obtained pre-miRNA hairpin (Fig. 2) is transferred to the cytoplasm by the Exportin-5 protein (XPO5) and its cofactor Ran-GTP [49]. In some cases, other complexes can be involved, for example, Exportin-1 (XPO1) for mir-320/mir-484 and other pre-miRNAs with m7G cap at the 5'-end formed directly by transcription [50, 51]. These complexes not only move the pre-miRNA hairpin but also prevent it from degradation by cellular exonucleases. The transport complex is specific to the shape of the precursor and captures it like a “baseball mitt” binding to a helix, terminal loop, and overhanging 3'-end [52, 53]. Thus, the transport complex can transfer any small hairpin RNAs (including viral or random), and these hairpins can regulate miRNA expression through competition with pre-miRNAs for the transport complex [54]. With genetic damage to the elements of the transport complex, its structure does not match the structure of the precursor, and it is not able to capture and move the pre-miRNA, which, in turn, leads to the accumulation of pre-miRNAs in the nucleus and a decrease of the miRNA expression level [55]. Also, viral miRNAs can block transport, both competing with real pre-miRNAs and addressing mRNA elements of the transport complex, for example, Ran-GTP [56].

Pre-miRNA PROCESSING BY DICER AND ITS COFACTORS

After moving into the cytoplasm, pre-miRNA binds to the Dicer complex (Fig. 2), which cleaves the precursor hairpin near the terminal loop and produces duplex miRNA–miRNA with overhanging 3'-ends [3]. Thus, at this stage, the second end of the miRNA is formed. Typically, Dicer contains the following domains: helicase, PAZ, α-helix, and two RNases III [57, 58]. Helicase forms a clamp-like structure that is adjacent to RNases, facilitates the recognition of the pre-miRNA terminal loop, and moves and fixes the precursor hairpin [58]. The PAZ domain has two pockets for binding to the pre-miRNA ends [57, 59]. Each of the two RNAses III cuts one of the two pre-miRNA branches and releases the miRNA duplex from the terminal loop. PAZ and RNases III are connected by an α-helix, the length of which determines the distance between these domains and, thus, the size of the produced duplex [57].

Essentially, Dicer can process short hairpins regardless of their nucleotide sequences [60]. Binding pockets in the PAZ domain are spaced at the length of two nucleotides of the overhanging 3′-end, such as that produced by Drosha. However, using only one pocket, Dicer may process hairpins without 3'-overhangs or with an overhanging 5'-end.

The Dicer structure determines the location of the cleavage site. For most pre-miRNAs, this site is located at a fixed distance from the overhanging 3'-end (the so-called “3'-counting rule”) [59]. Dicer can also recognize the phosphate group at the 5'-end of the hairpin and cut the pre-miRNA at a distance of ~22 nt from this end [61]. This so-called “5'-counting rule” is observed mainly when the stem at the hairpin base is not closed by a strong GC pair [61]. The deviations from these two rules are observed due to modifications of pre-miRNAs, such as a change in the length of the ends of pre-miRNAs, which moves the Dicer cleavage site and the miRNA seed. Modifications of the 3'-ends occur more often than the 5'-ends, which makes the “5'-counting rule” more important for pre-miRNAs with a modified 3'-end.

Like Drosha, Dicer processing is regulated in many ways. An important cis-regulatory element of both activation and repression of the Dicer cleavage is the terminal loop of pre-miRNA. For example, the KSRP and TDP-43 proteins control Dicer activity by binding to a terminal loop [31, 32]. The LIN28A and LIN28B proteins, recognizing the GGAG motif in the terminal loop (for example, let-7, miR-107, miR-143, miR-200c, and others), attract the protein from the family of terminal uridyltransferases (for example, TUT4 or TUT7) [62]. Transferase adds an oligonucleotide U-tail to the 3'-end of the pre-miRNA: this blocks Dicer processing and leads to a pre-miRNA degradation, in particular, when the overhanging end had a canonical 2 nt length [63]. A similar method of blocking the miRNA maturation process through uridylation of the 3'-end of the hairpin is also present in Drosophila [64], while Tailor transferase recognizes and uridylates pre-miRNA with guanine on the 3'-end, which may be the reason for avoidance of guanine at this end in observed miRNAs [65]. Conversely, for the single nucleotide end, TUT4 restores its canonical length (for example, let-7 and miR-105), which further contributes to their canonical processing [66].

Another motif within the pre-miRNA terminal loop, UGC, binds to the MBNL1 protein, which helps to cleave the precursor [67]. MBNL1 competes with LIN28 for binding to the terminal loop and thus protects pre-miRNA from elongation of the 3'-end [67]. The GCAUG motif (Fig. 3) in the terminal loop binds to Rbfox proteins and inhibits the maturation of human miRNAs miR-20b, miR-107, and others [68].

In addition to proteins, various RNAs can also affect Dicer processing: C. elegans noncoding RNA rncs-1 can replace endogenous double-stranded RNAs for binding to Dicer [69]. MicroRNAs can bind to Dicer mRNAs and thus regulate their own expression through feedback loop (e.g., let-7 and miR-BART6-5p for human Dicer) [70].

The sequence and structure of pre-miRNAs determine not only the speed but also the quality of the precursor cleavage. The accuracy of Dicer product obeys the so-called “loop-counting rule”: cutting a 3'-branch of a pre-miRNA is more accurate when a loop is located at a distance of two nucleotides upstream the cleavage site [71]. In other cases, Dicer produces variable 5'-ends of 3'-miRNAs [71].

miRISC AND AGO PROCESSING

After the formation of the miRNA–miRNA duplex, it, together with one of the proteins of the Argonaute family (AGO) and co-factors, creates a miRNA-induced silencing complex (miRISC). This complex unwinds the duplex and selects one of its branches, which will subsequently bind to mRNA [72]. In the process of duplex unwinding, one of its strands (“guide”) is more commonly used to form the mature miRISC complex, while the other strand (“passenger”) is cleaved or quickly degrades. The ratio of the “guide” and “passenger” miRNA fractions is determined by thermodynamic stability and terminal nucleotide or may vary depending on the stage of development of the organism, its sex, tissue (in which expression occurs), or the orientation of the duplex when loaded into the complex [60, 73]. Proteins AGO prefer a strand with nucleotides A or U at the 5'-end or that duplex strand in which the terminal pair at this end is less stable [74]. On the other hand, strand exchange can be caused by destabilizing [75] duplex editing, and this editing is more often observed in the miRNA seed (in humans and mice, but not in D. melanogaster) [76]. Also, the strand may change owing to the displacement of pre-miRNA in mRNA, the so-called “hairpin shift” [77].

AGO proteins and their functions vary depending on the species [78]. Some species have a specialization of AGO proteins in the form of small RNAs depending on the characteristics of their duplexes. For example, Drosophila Ago1 prefers to bind to a duplex in which uracil is located at the 5'-end of the “guide” miRNA, while Ago2 prefers a miRNA duplex with cytosine at the same end [79]. In humans, the binding of miRNA to the proteins of the Argonaute family (Ago2 and Ago3) correlates with the presence of YRHB tetranucleotides near miRNA 3'-end, and the binding of miRNA to the protein Ago2 correlates with the presence of RHHK tetranucleotides in the center of the sequence [80]. Loops in positions 9–10 of the “guide” Drosophila miRNA direct the miRNA duplex to Ago1, but do not allow it to be processed by Ago2 [81]. That is, in the absence of loops, the “passenger” miRNA is cleaved, and the “guide” is included in the miRISC complex with Ago2. In the opposite case, if there are loops in the center of the duplex, the “passenger” strand degrades, and the “guide” branch is included in the miRISC complex with the Ago1. However, in humans, the effect of the loops in the middle of the miRNA duplex on the choice of AGO protein is absent [82]. Other strand selection can be fixed evolutionarily with new miRNA functions [77].

NONCANONICAL PATHWAYS OF BIOGENESIS

In addition to the above-described canonical pathway of the biogenesis, miRNAs mature by other, Dicer- and Drosha-independent, pathways (Fig. 2). Most of the noncanonical miRNAs are mirtrons, which skip the Drosha step of mRNA processing, and their pre-miRNAs are formed through splicing, lariat-debranching, and refolding to the canonical hairpin structure [83]. Some of the mirtrons contain additional nucleotides at one or both ends of the hairpin, which distinguish the structure from the canonical one. Subsequently, these nucleotides are removed by exonucleases, and the maturation of miRNA continues along the canonical pathway [83].

Although hundreds of mirtrons have been discovered to date [84, 85], the features of their maturation are less studied than for canonical miRNAs. On average, the hairpin of mirtron pre-miRNA is slightly longer than the canonical precursor [85]. After splicing, most of the mirtron precursors (so-called “tailed mirtrons”) contain a long extensions of nucleotides between the splicing site and the base of the pre-miRNA hairpin [84]. Trimming the tails of these mirtrons by exonucleases can shift the miRNA ends [86]. In many mirtrons without hanging tails and in mirtrons with a hanging 3'-tail (so-called “3'-tailed mirtrons”), the guanine located at the boundary of the intron and exon is additionally removed from the 5'-end [85]. This guanine removal may be an additional mechanism for increasing the accuracy of the pre-miRNA cleavage by RNase III Dicer [87].

The typical structure of mirtron precursors contains feature imprints of their biogenesis and differs from the structure of canonical pre-miRNAs: the overhanging end at the base of the hairpin often consists of one rather than double nucleotides [87]. Compared to canonical miRNAs, mirtrons have a higher density of single nucleotide polymorphisms (SNPs), and the SNPs themself are inversely associated with diseases. Mirtrons are probably under positive (driving) selection, while most of the canonical miRNAs are under negative (stabilizing) one; mirtrons can be an inherent source of variability that contributes to disease. It is interesting that the splicing branchpoint nucleotide is often located exactly in that place so as not to be shielded by the structure of the mirtron precursor [87]. In this way, mirtrons can be easily created from random hairpins of a suitable size near the 3'-splice site. As a result, mirtrons more often appear and disappear faster than the canonical miRNAs, even if they both generate functionally identical regulatory RNAs [88].

The biogenesis of the mammalian miRNAs miR-1225 and miR-1228, originally described as mirtrons, is in fact independent of splicing [89]. But their further biogenesis does not require the canonical components DGCR8, Dicer, Exportin-5, or AGO, but only Drosha [90]. This class of miRNAs is called simtrons (splicing-independent mirtron-like RNAs). Some introns contain simultaneously mirtrons and simtrons and resulting miRNAs—the product of choosing one of the biogenesis pathways [89].

More recently, another splice-dependent class of miRNAs was discovered—agotrons, whose biogenesis includes the integration of miRNA in miRISC and interaction with protein of the Argonaute family, but does not require the Dicer cleavage step [91].

Another Dicer-independent pathway is observed for the miR-451 miRNA family in which the Ago2 protein carry out the catalytic function of the Dicer [92, 93]. Drosha generates the pre-mir-451 precursor with a stem that is too short for Dicer processing, ~18 nucleotides. The Ago2 protein cleaves pre-miRNA stem in the middle of the 3'-branch and produces a ~30-nucleotide RNA product [92, 93]. Then the poly(A)-specific ribonuclease PARN cuts off the 3'-end of this product and releases mature 5'-miRNA, with no difference between the activity level of this miRNA and the initial 30-nucleotide product [94]. Also, in contrast to other pre-miRNAs, the terminal loop does not affect the maturation of miR-451 [94]. Protein Ago2 can replace Dicer not only for short hairpins but also in its absence; and vice versa, the miR-451 precursor with the restored canonical stem length can be processed by Dicer [95]. Thus, the Ago2 and Dicer paths appear to be interchangeable for miR-451.

MUTATIONS, SINGLE-NUCLEOTIDE POLYMORPHISMS, A>I EDITING

Changes in the pri-/pre-/miRNA sequences can affect the secondary structure of the precursor and thus change the expression level and function of miRNA. Nucleotide substitutions are more often observed in the center of miRNA and in the vicinity of its ends in the precursor; the regions responsible for addressing miRNA, the so-called seed (positions 2–8 of miRNA) and additional seed (positions 13–16), are more conservative [96]. These two regions are also characterized by the least presence of loops [87, 96]: a similar relationship between the intensity of mutagenesis and the secondary structure is characteristic of structural RNAs, for example, tRNA [97].

Single nucleotide polymorphism (SNP) is one of the types of substitutions in DNA that affects the maturation process and function of miRNAs. SNPs in human pre-miRNA can inhibit or increase miRNA expression at the cleavage steps of Drosha (for example, miR-30c, miR-125a, miR-146a, miR-510, etc.) or Dicer (e.g., miR-196a, etc.) [22, 98]. Replacing C>T in the first position of the CNNC motif slows down the Drosha processing of pri-mir-16-1 [41]. In addition, SNPs inside miRNA (especially inside the seed region) or inside its target influence on the miRNA function [99]. Substitution in the first nucleotide of human miR-934-5p leads to a change in the cleavage sites of Dicer and Drosha, as well as to a change in the strand of miRNA duplex involved in the miRISC complex [100].

Mutations in the protein-coding genes of the miRNA processing complexes (e.g., Dicer, Drosha, or Exportin-5) may block the functions of these complexes [55, 101].

Another way to regulate miRNA maturation is RNA editing [102]. The ADAR enzyme replaces adenosine (A) by inosine (I), which can both block the Drosha or Dicer cleavage and facilitate it [103]. Modifications of pri-/pre-miRNA sequences (for example, in pri-mir-151 or in the human and mouse miR-376 cluster) lead to a change in the stability of the pri-miRNA structure and miRNA targets thus can block Dicer cleavage step [76, 104, 105]. In addition, such editing of one strand of miRNA duplex can result in the choice of the second strand as functional [76].

Remarkably, although some of the entries in SNP databases are actually RNA editing events [106], A>I editing density behaves oppositely to the density of SNPs and is more often observed in the seed region of human and mouse miRNAs (but not D. melanogaster) [76].

HETEROGENEITY OF miRNA ENDS AND ANNOTATION ERRORS

In the canonical pathway of the biogenesis, Drosha and Dicer often cut the same hairpin of a precursor in several neighboring positions (Fig. 3) and thus produce miRNAs with different ends, so-called isomiRs [107]. The accuracy of the Drosha cleavage is sensitive to pri-miRNA sequence motifs and controlled by DGCR8 [35]. The RIIIDa domain of Dicer cleaves more precisely than the RIIIDb domain, and is more sensitive to both the secondary structure and the nucleotide sequence of its substrate [65, 71]. As a result miRNA 5'-end (the one adjacent to seed region) is more homogeneous, and the heterogeneity of cleavage is appeared mainly in the variability of the lengths of the 3'-overhangs [65, 108]. The structure similarity of Dicer and Drosha [109], as well as the distribution of the overhang lengths of miRNA duplexes [87], suggests that exactly the same is true for Drosha. The lengths of the overhanging ends are interdependent, which served as a basis for the double-lever model of the dynamics of the Dicer cleavage complex [87].

The shift of miRNA ends in the precursor can change targets, thereby expanding the functional repertoire of the miRNA [110, 111]. The overhanging ends can be additionally modified in both directions by adenylation, uridylation, and activity of exonucleases, which leads to extra heterogeneity on the miRNA ends (in most cases, 3') [112114]. Thus, the miRNA 5'‑end, which is responsible for the target selection, is more uniform. However, for some miRNAs, for example, miR‑79, miR‑193, and miR‑210 for D. melanogaster and miR‑124, miR‑133a, and miR‑223 for M. musculus [77], two or more miRNA fractions (isomiRs) are observed at the levels comparable to the main form. The change of the main form caused by a 5'-end shift (Fig. 4), the so-called seed shifting, leads to a changing the miRNA targets and is sometimes observed in evolution [115]. It is worth noting that only about a third of the annotated miRNAs from the miRBase (release 21.0) are identical to the main form of their isomiRs, while about 37% of the miRNAs (Fig. 4) were never the most observed ones [116].

Fig. 4.
figure 4

Example of a change of the main form of isomiRs and a seed shifting in D. melanogaster. Data obtained from the miRBase database. In the pre-miRNA sequences, annotated miRNAs are highlighted in gray. Read sequences are underlined; only fractions with a minimum number of observations of 10 000 are shown. In dme-mir-79, the read fractions with different start positions are observed. Each of these positions is evolutionarily fixed in the dme-mir-9c and dme-mir-4 precursors in the miRNA forms with a corresponding one-nucleotide seed shifting. Note that, in the miRBase, the main fraction of reads is not always identical to annotated miRNA (for example, dme-miR-9c-3p or dme-miR-4-3p), which may be database errors.

Unlike canonical miRNAs, the mirtron ends are more variable and serves as an additional factor of the mature product heterogeneity. As a result of splicing, mirtrons can have an extra tails, which are subsequently trimmed by 3'–5' and 5'–3' exonucleases [84]. In most mirtrons, the pre-miRNA hairpin is located near the 3'-splice site and therefore has a long hanging 5'-end whose cutting defines the 5'-end of 5'-miRNA and its targets. For the remaining mirtrons, the 5'-end of their 5'-miRNA is often determined by splicing and subsequent loss of terminal guanine [85]. On the opposite strand of pre-miRNA, the terminal uridyltransferase Tailor recognizes guanine at the 3'-end of the splicing site (3'-AG) and uridylates this end of the precursor, which blocks further miRNA maturation in Drosophila [64]. Thus, although Drosha is absent in the biogenesis of mirtrons, they, as well as the canonical pre-miRNAs [65], avoid guanine at the 5' (3')-end of its 5' (3')-miRNA, respectively. This suggests that guanine at the pre-miRNA ends affects the precursor processing by Dicer.

The level of miRNA expression and the level of RNA degradation in different tissues show that “new” miRNAs (added in the latest miRBase releases) demonstrate increased expression at a high degree of RNA degradation and therefore they may be artifacts [117]. Such false positive sequences, as well as heterogeneity of miRNA ends, lead to the appearance of annotation errors in databases. The identification and verification of miRNAs is complicated by their small size and noncoding origin. The development of deep sequencing methods and bioinformatics algorithms for processing their results has led to a rapid increase in the number of miRNA candidates, including the false ones. The following criteria are used to cut off false positive miRNA sequences: evolutionary conservatism, experimental validation of miRNA duplex with dinucleotide 3'-overhangs, structural sequence similarity, heterogeneity of the miRNA 5'-ends [118]. Only one-third of human miRNAs and about one-sixth of animal miRNAs from the miRBase satisfy these criteria: the MirGeneDB database with conditionally valid miRNAs was formed from them [118]. Almost all noncanonical miRNAs do not satisfy the criteria described above.

False positive miRNAs can be generated by transcriptional noise, other types of small RNAs and tRNAs, and transposons. Another reason for the error is the contamination of data during the experiments, including the presence of miRNAs from another organism (exogenous miRNAs, or xenomiRs), for example, plant miRNAs in mice and humans [119]. So far, xenomiRs have not been observed in reproducible experiments, and the results of those experiments in which they were observed are not freely available, which makes it impossible to independently control their purity [120].

CONCLUSIONS

To date, much has become known about the miRNAs, their biogenesis and functions, but some questions remain open. The integration of miRNAs in various regulatory loops, as well as the influence of pri-/pre-/ miRNA primary and secondary structures on the biogenesis and functions, provides a rich repertoire of methods for controlling cellular processes and, obviously, we know only a small part of them. The miRNA biogenesis diversity, structural pri-/pre-/miRNA changes and their consequences, as well as xenomiRs and isomiRs, require further study. Although each miRNA sequence may be presented in several slightly different forms, the significance of their abundance is still unknown. The inherent biochemical variability of miRNAs increases their functional repertoire and can lead to genetic variability of themselves and their control systems, in particular, editing mechanisms. Owing to miRNA properties, they occupy an important, possibly a central, place in RNA epigenetics, otherwise known as epitranscriptomics.