A periodicity lemma for partial words☆
Introduction
Periodicity is a fundamental concept in combinatorics on words; see, e.g., the book [2]. A ubiquitous tool in periodicity on standard words is Fine and Wilf's Periodicity Lemma [3]. Our work can be seen as a part of the quest to extend Fine and Wilf's result to partial words, that is, to words with don't care symbols. Other known extensions of the Periodicity Lemma include a variant with three [4] and an arbitrary number of specified periods [5], [6], the so-called new periodicity lemma [7], [8], a periodicity lemma for repetitions with morphisms [9], extensions into abelian [10] and k-abelian [11] periodicity, into abelian periodicity for partial words [12], into bidimensional words [13], and other variations [14], [15].
Consider a word X of length , with its positions numbered 0 through . We say that X has a period p if for all . In this case, the prefix is called a word period of X. The original result of Fine and Wilf can be stated as follows.
Lemma 1.1 Periodicity Lemma [3] If are periods of a word X of length , then is also a period of X.
A partial word is a word over the alphabet , where ⋄ denotes a hole (a don't care symbol). A partial word that does not contain hole symbols is called a total word or simply a word. In what follows, by n we denote the length of the partial word and by h the number of holes. For , the relation of matching ≈ is defined so that if or either of these symbols is a hole. A word P of length p is a word period of a partial word X if for . In this case, we say that the integer p is a period of X.
We aim to compute the minimal lengths which make the following generalization of the periodicity lemma valid:
Lemma 1.2 Periodicity Lemma for Partial Words If are periods of a partial word X with h holes and length , then is also a period of X.
Our results. We examine the values as a function of for a given h. We use two auxiliary functions and , which correspond to two restricted families of partial words; corresponds to special arrangements of Shur and Gamzova [19] and to a restricted family of words that were used by Blanchet-Sadri et al. [20]. As our main combinatorial result, we prove that is always equal to or and we characterize the arguments h for which either case holds. Finally, this lets us derive a closed-form formula for with arbitrary fixed h using a sequence of fractions. Our construction relies on the theory of continued fractions; we also apply this link to describe in terms of standard Sturmian words.
As for algorithmic results, we show how to compute using arithmetic operations, improving upon the state-of-the-art complexity . Furthermore, for any fixed h in time we can compute a compact description of the length function . For the base case of , , and , the representation is piecewise linear in p and q. More precisely, the interval can be split into subintervals I so that restricted to is of the form for some integers .
Preliminaries. If or q, then Lemma 1.2 trivially holds for each partial word X. Otherwise, as proved by Fine and Wilf [3], for the threshold in Lemma 1.1 is known to be optimal, so .
A partial word X is called unary if it has period 1 and non-unary otherwise.
Example 1.3 , because each partial word of length at least 12 with one hole and periods 5, 7 has also period , the partial word of length 11 is non-unary and has periods 5, 7.
As an intermediate step, we consider a dual holes function , which gives the minimum number h of holes for which there is a partial word of length n with h holes and periods which do not satisfy Lemma 1.2.
Example 1.4 We have because : due to the classic periodicity lemma, every total word of length 11 with periods 5 and 7 has period , and : is non-unary, has one hole and periods 5, 7.
We have since appending ⋄ preserves periods. In fact . However, there is no non-unary partial word of length 16 with 2 holes and periods 5, 7, so ; see Table 1.
For a function monotone in n, we define its generalized inverse as:
Observation 1.5 .
As observed above, Lemma 1.2 becomes trivial if p divides q (noted as ). The case of is known to be special as well, but it has been fully described in [17]. Furthermore, it was shown in [19], [20] that the case of is easily reducible to that of . We recall these existing results in Section 4, while in the other sections we assume that and .
Overview of the paper. In Section 2 we characterize and its generalized inverse , whereas in Section 3 we study and its generalized inverse . Our main combinatorial result, dubbed Characterization Theorem, is stated in Section 4; however, we defer its laborious proof to Section 9. It lets us design a slower, -time algorithm for computing . This running time is improved to in Section 6, which is preceded by Section 5, where we recall the necessary number-theoretic tools related to continued fractions and Farey sequences. Finally, closed-form formulae for L and are developed in Section 7, and Section 8 relates to standard Sturmian words.
Section snippets
Functions and
For relatively prime integers , , and an integer , let us define We shall prove that for a suitable range of lengths n.
Fine and Wilf [3] constructed a word of length with periods p and q and without period 1. For given we choose such a total word and define a partial word as follows, setting (see Fig. 1): Note that .
Example 2.1 For and , we
Functions and
In this section, we study a family of partial words corresponding to the special arrangements introduced in [19]. For relatively prime integers , we say that a partial word S of length is -special if it has a position l such that for each position i: See Fig. 2 for an example. The -special partial words can be characterized as follows.
Observation 3.1 Assume S is a binary partial word S over alphabet with coprime periods .
First algorithm for computing the length function L
Shur and Gamzova in [19] proved that for . We give a complete characterization of H in terms of and , and we derive an analogous characterization of L in terms of and . The tedious proof, based on a graph-theoretic approach similar to that in [20], is postponed to the last section.
Theorem 4.1 Characterization Theorem Let p and q be relatively prime integers such that . For each integer , we have Moreover, for each integer
Number-theoretic tools
A more efficient algorithm for evaluating L relies on the theory of continued fractions; we refer to [25] and [26] for a self-contained yet compact introduction. A finite continued fraction is a sequence , where and for . We associate it with the following rational number: Depending on the parity of m, we distinguish odd and even continued fractions. Often, an improper continued fraction is also introduced and assumed to be odd.
Faster algorithm for computing the length function L
The bottleneck of Algorithm 1 is evaluating the function using Lemma 3.4(b). Thus, in this section we apply Fact 5.4 to develop a more efficient characterization of . Let us start with an auxiliary lemma providing upper and lower bounds for the function (whose generalized inverse is ). Lemma 6.1 If , then Proof Recall that as stated in Lemma 3.4(a). The first part of the claim holds because for we have
Closed-form formula for
In this section, we show how to compute a compact representation of the function in time. We start with such representation for , which follows from a combination of Lemma 6.2, Lemma 3.6.
Recall the basic points and middle points defined in Section 3.1. Here, we use -basic points and -middle points defined for as respectively Note that for each i, we have , but none of the inequalities is strict in
Relation to standard Sturmian words
For a finite sequence of positive integers, a Sturmian word is recursively defined as , where , , and for ; see [28, Chapter 2]. The sequence γ is called the directive sequence of the Sturmian word . We classify directive sequences γ (and the Sturmian words ) into even and odd based on the parity of m.
Observation 8.1 Odd Sturmian words of length at least 2 end with , while even Sturmian words of length at least 2 end with .
For a directive sequence
Proof of the characterization theorem
Let us define the -graph as an undirected graph with vertices . The vertices i and j are connected if and only if or . Observe that is the minimum size of a vertex separator in G, i.e., the minimum number of vertices to be removed from G so that the resulting graph is no longer connected; see Fig. 6.
We say that an edge of the -graph is a p-edge if and a q-edge if . The set of all nodes giving the same remainder modulo
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (28)
- et al.
Fine and Wilf's theorem for three periods and a generalization of Sturmian words
Theor. Comput. Sci.
(1999) - et al.
Fine and Wilf words for any periods II
Theor. Comput. Sci.
(2009) - et al.
The new periodicity lemma revisited
Discrete Appl. Math.
(2016) - et al.
On Fine and Wilf's theorem for bidimensional words
Theor. Comput. Sci.
(2003) - et al.
Partial words and a theorem of Fine and Wilf
Theor. Comput. Sci.
(1999) - et al.
Graph connectivity, partial words, and a theorem of Fine and Wilf
Inf. Comput.
(2008) - et al.
Periods in partial words: an algorithm
J. Discret. Algorithms
(2012) - et al.
Partial words and a theorem of Fine and Wilf revisited
Theor. Comput. Sci.
(2002) Periodicity on partial words
Comput. Math. Appl.
(2004)- et al.
A new approach to the periodicity lemma on strings with holes
Theor. Comput. Sci.
(2009)
On periodicity lemma for partial words
Combinatorics on Words
Uniqueness theorems for periodic functions
Proc. Am. Math. Soc.
On a paper by Castelli, Mignosi, Restivo, RAIRO
Theor. Inform. Appl.
Cited by (0)
- ☆
Supported by the Polish National Science Center, grant no. 2014/13/B/ST6/00770. A preliminary version of this work appeared at LATA 2018 [1].