Universal insertion grammars of size two
Introduction
An insertion grammar is a pure grammar (i.e., there are no non-terminals as opposed to terminal symbols) having only rules of form . Such grammars originated in [6]; they are inspired by Marcus contextual grammars [17], [26] used in linguistics. Another motivation for their study comes from the area of biology. As pointed out in [27], the process of mismatched annealing of DNA strands can be seen as an insertion or a deletion of a string in a specified context. A similar process happens in the case of RNA editing [2], where the uracil base U is inserted or deleted in some left context, as well as in the case of CRISPR-Cas9 technology that uses insertion and deletion to edit the genome [28], [1]. These observations led to the intense study of insertion-deletion systems (considering insertion and deletion operations together) in the framework of DNA computing [13], [29], [11], [18], [19], [30], [31].
There are several related models using a similar principle of insertion or deletion of a string in a specified context. We cite guided-insertion systems [3] used to model RNA editing, leftist grammars [20] used to model accessibility problems in protection systems, restarting automata [10] used to model the analysis by reduction and the insertion operation from [8] introduced as a generalization of the concatenation (and which corresponds to a context-free insertion grammar).
Since an insertion grammar is a pure grammar, an additional squeezing mechanism must be used in order to obtain a final language, as otherwise the described language class will have poor closure properties. Usually some restricted transducers are used for this purpose. Some standard examples are (1) the intersection with the free monoid over a terminal alphabet (used traditionally for Chomsky grammars), (2) projection composed with an inverse morphism, (3) left/right quotient by a regular language, (4) projection composed with intersection with a regular language. In the area of insertion grammars, variant (2) is mostly used, because (1) may only give at most context-sensitive languages [27].
The size of the insertion grammar is naturally defined by the triple , where n (resp. m, ) is the maximal length of the inserted string (resp. left context, right context). If all parameters coincide, we just say that the grammar is of size n. In [27], it was shown for the first time that insertion grammars of size (4,7,6) together with the squeezing mechanism (2) as described above generate all recursively enumerable languages. This result was improved in [21], where insertion grammars of size (3,5,4) are shown to be sufficient for the same task. Continuing with this race, papers [23], [12] show that insertion grammars of size 3 generate all recursively enumerable languages with squeezing mechanisms (2), (3) and (4). In [24], [22] the squeezing mechanism (4) was thoroughly investigated showing that the result above also holds when some restricted classes of regular languages are used.
Further decrease in size of rules was achieved only by using additional control mechanisms, like graph-control for the size 2 [14], [15] and matrix control [25], [16] for the size (1,2,2) [4].
In this paper, we show that it is possible to generate all recursively enumerable languages using insertion grammars of size 2 with squeezing mechanisms (2), (3) and (4). This settles the question of the computational power of this model when all three parameters are the same, as insertion grammars of size are shown to be context-free [27]. This also proves a remarkable jump when combined with squeezing mechanisms as described above, because the context-free languages are closed under these operations. Hence, while insertion grammars of size 1 with squeezing mechanisms (2), (3) and (4) stay within the context-free languages, size 2 grammars jump up to all recursively enumerable languages. The proof of the main result is greatly simplified by the introduced notion of an independent rule set that allows to formalize the conditions when the application of two insertion rules in a derivation is independent and can be done in any order.
Section snippets
Definitions
We assume the reader is familiar with the standard concepts used in formal languages theory. We recall some of them in order to fix the notations. Given an alphabet (a finite set) V, let denote the set of all strings over V, i.e., the free monoid generated by V; the operator symbol ⋅ (for concatenation) is mostly omitted. For a string , we denote the length of x by and the empty string is written as λ. If not explicitly stated otherwise, the notion of a morphism refers to a
Mark and migration technique
For the proof of the main result, we use a variant of the mark-and-migration technique introduced in [27] and that is commonly used to obtain computational completeness results in the area of insertion systems. This technique is based on a simulation of a type-0 grammar. Since it is not possible to delete symbols in the derivation string, the main idea is to simulate the deletion of a non-terminal symbol X by adding a special marker $ to its left (in [27] two markers # and $ were used). Such a
Main results
Theorem 2 For each recursively enumerable language L, there exists a morphism h, a weak coding g and a language such that . Proof Let be a type-0 grammar in SGNF. We construct the insertion grammar , where The set of rules is constructed as follows. Consider the following sets: For any rule we add the
Conclusions
In this paper, we have shown computational completeness of insertion grammars of size 2 enriched with different squeezing mechanisms. Since insertion grammars of size are known to be context-free [27], only cases , , , , as well as their symmetric variants remain to be investigated for computational completeness.
The proof of the result was greatly simplified using the concept of an independent rule set. We have checked that the rules given in Theorem 2 are
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (31)
- et al.
Regulated RNA rewriting: modelling RNA editing with guided insertion
Theor. Comput. Sci.
(2007) Insertion languages
Inf. Sci.
(1983)- et al.
On the weight of universal insertion grammars
Theor. Comput. Sci.
(2008) - et al.
Contextual insertions/deletions and computability
Inf. Comput.
(1996) - et al.
Context-free insertion-deletion systems
Theor. Comput. Sci.
(2005) - et al.
Matrix insertion-deletion systems
Theor. Comput. Sci.
(2012) The CRISPR tool kit for genome editing and beyond
Nat. Commun.
(2018)- et al.
Universal matrix insertion grammars with small size
- et al.
Graph-controlled insertion-deletion systems
Semicontextual grammars
Normal forms for phrase-structure grammars
RAIRO Theor. Inform. Appl.
Universality of graph-controlled leftist insertion-deletion systems with two states
Restarting automata
At the crossroads of DNA computing and formal languages: characterizing recursively enumerable languages using insertion-deletion systems
Cited by (5)
On the computing powers of L-reductions of insertion languages
2021, Theoretical Computer ScienceL -reduction computation revisited
2022, Acta InformaticaREGULATED INSERTION-DELETION SYSTEMS
2022, Journal of Automata, Languages and CombinatoricsOn the generative capacity of matrix insertion-deletion systems of small sum-norm
2021, Natural ComputingParsimonious Computational Completeness
2021, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)