Elsevier

Theoretical Computer Science

Volume 843, 2 December 2020, Pages 153-163
Theoretical Computer Science

Universal insertion grammars of size two

https://doi.org/10.1016/j.tcs.2020.09.002Get rights and content

Abstract

In this paper, we show that pure insertion grammars of size 2 (i.e., inserting two symbols in a left and right context, each consisting of two symbols) can characterize all recursively enumerable languages. This is achieved by either applying an inverse morphism and a weak coding, or a left (right) quotient with a regular LOC(2) language, or an intersection with a LOC(2) language and a weak coding. The obtained results improve the descriptional complexity of insertion grammars and complete the picture of known results on insertion-deletion systems that are motivated from the DNA computing area.

Introduction

An insertion grammar is a pure grammar (i.e., there are no non-terminals as opposed to terminal symbols) having only rules of form uvuxv. Such grammars originated in [6]; they are inspired by Marcus contextual grammars [17], [26] used in linguistics. Another motivation for their study comes from the area of biology. As pointed out in [27], the process of mismatched annealing of DNA strands can be seen as an insertion or a deletion of a string in a specified context. A similar process happens in the case of RNA editing [2], where the uracil base U is inserted or deleted in some left context, as well as in the case of CRISPR-Cas9 technology that uses insertion and deletion to edit the genome [28], [1]. These observations led to the intense study of insertion-deletion systems (considering insertion and deletion operations together) in the framework of DNA computing [13], [29], [11], [18], [19], [30], [31].

There are several related models using a similar principle of insertion or deletion of a string in a specified context. We cite guided-insertion systems [3] used to model RNA editing, leftist grammars [20] used to model accessibility problems in protection systems, restarting automata [10] used to model the analysis by reduction and the insertion operation from [8] introduced as a generalization of the concatenation (and which corresponds to a context-free insertion grammar).

Since an insertion grammar is a pure grammar, an additional squeezing mechanism must be used in order to obtain a final language, as otherwise the described language class will have poor closure properties. Usually some restricted transducers are used for this purpose. Some standard examples are (1) the intersection with the free monoid over a terminal alphabet (used traditionally for Chomsky grammars), (2) projection composed with an inverse morphism, (3) left/right quotient by a regular language, (4) projection composed with intersection with a regular language. In the area of insertion grammars, variant (2) is mostly used, because (1) may only give at most context-sensitive languages [27].

The size of the insertion grammar is naturally defined by the triple (n,m,m), where n (resp. m, m) is the maximal length of the inserted string (resp. left context, right context). If all parameters coincide, we just say that the grammar is of size n. In [27], it was shown for the first time that insertion grammars of size (4,7,6) together with the squeezing mechanism (2) as described above generate all recursively enumerable languages. This result was improved in [21], where insertion grammars of size (3,5,4) are shown to be sufficient for the same task. Continuing with this race, papers [23], [12] show that insertion grammars of size 3 generate all recursively enumerable languages with squeezing mechanisms (2), (3) and (4). In [24], [22] the squeezing mechanism (4) was thoroughly investigated showing that the result above also holds when some restricted classes of regular languages are used.

Further decrease in size of rules was achieved only by using additional control mechanisms, like graph-control for the size 2 [14], [15] and matrix control [25], [16] for the size (1,2,2) [4].

In this paper, we show that it is possible to generate all recursively enumerable languages using insertion grammars of size 2 with squeezing mechanisms (2), (3) and (4). This settles the question of the computational power of this model when all three parameters are the same, as insertion grammars of size (n,1,1) are shown to be context-free [27]. This also proves a remarkable jump when combined with squeezing mechanisms as described above, because the context-free languages are closed under these operations. Hence, while insertion grammars of size 1 with squeezing mechanisms (2), (3) and (4) stay within the context-free languages, size 2 grammars jump up to all recursively enumerable languages. The proof of the main result is greatly simplified by the introduced notion of an independent rule set that allows to formalize the conditions when the application of two insertion rules in a derivation is independent and can be done in any order.

Section snippets

Definitions

We assume the reader is familiar with the standard concepts used in formal languages theory. We recall some of them in order to fix the notations. Given an alphabet (a finite set) V, let V denote the set of all strings over V, i.e., the free monoid generated by V; the operator symbol ⋅ (for concatenation) is mostly omitted. For a string xV, we denote the length of x by |x| and the empty string is written as λ. If not explicitly stated otherwise, the notion of a morphism refers to a

Mark and migration technique

For the proof of the main result, we use a variant of the mark-and-migration technique introduced in [27] and that is commonly used to obtain computational completeness results in the area of insertion systems. This technique is based on a simulation of a type-0 grammar. Since it is not possible to delete symbols in the derivation string, the main idea is to simulate the deletion of a non-terminal symbol X by adding a special marker $ to its left (in [27] two markers # and $ were used). Such a

Main results

Theorem 2

For each recursively enumerable language L, there exists a morphism h, a weak coding g and a language L1INS(2,2,2) such that L=g(h1(L1)).

Proof

Let G=(N,T,P,S) be a type-0 grammar in SGNF. We construct the insertion grammar G1=(V,{øøS$S},P1), whereV=NT{KAB,KCD}{X,X¯|X{$,,B1,D1}}{ø}.

The set of rules P1 is constructed as follows. Consider the following sets:L=T{A,C,ø},N=N{KAB,KCD,B1,D1,},N¯={B¯1,D¯1,¯},D={$X|XN}{$¯Y¯|Y¯N¯},F=L2D,S=V{$,$¯}=NN¯T{ø}.

  • For any rule k:XbYP we add the

Conclusions

In this paper, we have shown computational completeness of insertion grammars of size 2 enriched with different squeezing mechanisms. Since insertion grammars of size (n,1,1) are known to be context-free [27], only cases (1,2,2), (n,m,p), 1n2, 0p1, m0 as well as their symmetric variants remain to be investigated for computational completeness.

The proof of the result was greatly simplified using the concept of an independent rule set. We have checked that the rules given in Theorem 2 are

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (31)

  • B. Galiukschov

    Semicontextual grammars

  • V. Geffert

    Normal forms for phrase-structure grammars

    RAIRO Theor. Inform. Appl.

    (1991)
  • S. Ivanov et al.

    Universality of graph-controlled leftist insertion-deletion systems with two states

  • P. Jančar et al.

    Restarting automata

  • L. Kari et al.

    At the crossroads of DNA computing and formal languages: characterizing recursively enumerable languages using insertion-deletion systems

  • Cited by (5)

    • REGULATED INSERTION-DELETION SYSTEMS

      2022, Journal of Automata, Languages and Combinatorics
    • Parsimonious Computational Completeness

      2021, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View full text