A periodicity lemma for partial words

https://doi.org/10.1016/j.ic.2020.104677Get rights and content

Abstract

We investigate the function L(h,p,q), called here the length function, such that L(h,p,q) is the minimum length which guarantees that a natural extension of the periodicity lemma is valid for partial words with h holes and (so-called strong) periods p,q. In a series of papers, the formulae for the length function, in terms of p and q, were provided for each fixed h7. We demystify the generic structure of such formulae and give a complete characterization of the length function for any parameter h expressed in terms of a piecewise-linear function with O(h) pieces. We also show how to evaluate the length function in O(logp+logq) time, which is an improvement upon the best previously known O(p+q)-time algorithm.

Introduction

Periodicity is a fundamental concept in combinatorics on words; see, e.g., the book [2]. A ubiquitous tool in periodicity on standard words is Fine and Wilf's Periodicity Lemma [3]. Our work can be seen as a part of the quest to extend Fine and Wilf's result to partial words, that is, to words with don't care symbols. Other known extensions of the Periodicity Lemma include a variant with three [4] and an arbitrary number of specified periods [5], [6], the so-called new periodicity lemma [7], [8], a periodicity lemma for repetitions with morphisms [9], extensions into abelian [10] and k-abelian [11] periodicity, into abelian periodicity for partial words [12], into bidimensional words [13], and other variations [14], [15].

Consider a word X of length |X|=n, with its positions numbered 0 through n1. We say that X has a period p if X[i]=X[i+p] for all 0i<np. In this case, the prefix P=X[0..p1] is called a word period of X. The original result of Fine and Wilf can be stated as follows.

Lemma 1.1 Periodicity Lemma [3]

If p,q are periods of a word X of length |X|p+qgcd(p,q), then gcd(p,q) is also a period of X.

A partial word is a word over the alphabet Σ{}, where ⋄ denotes a hole (a don't care symbol). A partial word that does not contain hole symbols is called a total word or simply a word. In what follows, by n we denote the length of the partial word and by h the number of holes. For a,bΣ{}, the relation of matching ≈ is defined so that ab if a=b or either of these symbols is a hole. A word P of length p is a word period of a partial word X if X[i]P[imodp] for 0i<n. In this case, we say that the integer p is a period of X.

We aim to compute the minimal lengths L(h,p,q) which make the following generalization of the periodicity lemma valid:

Lemma 1.2 Periodicity Lemma for Partial Words

If p,q are periods of a partial word X with h holes and length |X|L(h,p,q), then gcd(p,q) is also a period of X.

Previous results. Periods that are studied in this work are also known as strong periods in contrast with weak periods which are defined as positive integers p such that X[i]X[i+p] holds for all 0i<|X|p. The study of periods in partial words was initiated by Berstel and Boasson [16], who proved that L(1,p,q)=p+q. They also showed that the same bound holds for weak periods p and q. Shur and Konovalova [17] developed exact formulae for L(2,p,q) and L(h,2,q) and an upper bound for L(h,p,q). A formula for L(h,p,q) with small values h was shown by Blanchet-Sadri et al. [18], whereas for large h, Shur and Gamzova [19] proved that the optimal counterexamples of length L(h,p,q)1 belong to a very restricted class of special arrangements. The latter contribution leads to an O(p+q)-time algorithm for computing L(h,p,q). An alternative procedure with the same running time was shown by Blanchet-Sadri et al. [20], who also stated closed-form formulae for L(h,p,q) with h7, expressed using a constant number of functions linear in p, q, and gcd(p,q). Weak periods of partial words were further considered in [21], [22], [23], [24].

Our results. We examine the values L(h,p,q) as a function of p,q for a given h. We use two auxiliary functions Ls(h,p,q) and Ld(h,p,q), which correspond to two restricted families of partial words; Ld corresponds to special arrangements of Shur and Gamzova [19] and Ls to a restricted family of words that were used by Blanchet-Sadri et al. [20]. As our main combinatorial result, we prove that L(h,p,q) is always equal to Ls(h,p,q) or Ld(h,p,q) and we characterize the arguments h for which either case holds. Finally, this lets us derive a closed-form formula for L(h,p,q) with arbitrary fixed h using a sequence of O(h) fractions. Our construction relies on the theory of continued fractions; we also apply this link to describe L(h,p,q) in terms of standard Sturmian words.

As for algorithmic results, we show how to compute L(h,p,q) using O(logp+logq) arithmetic operations, improving upon the state-of-the-art complexity O(p+q). Furthermore, for any fixed h in O(hlogh) time we can compute a compact description of the length function L(h,p,q). For the base case of p<q, gcd(p,q)=1, and h<p+q2, the representation is piecewise linear in p and q. More precisely, the interval [0,1] can be split into O(h) subintervals I so that L(h,p,q) restricted to pqI is of the form ap+bq+c for some integers a,b,c.

Preliminaries. If gcd(p,q)=p or q, then Lemma 1.2 trivially holds for each partial word X. Otherwise, as proved by Fine and Wilf [3], for h=0 the threshold in Lemma 1.1 is known to be optimal, so L(0,p,q)=p+qgcd(p,q).

A partial word X is called unary if it has period 1 and non-unary otherwise.

Example 1.3

L(1,5,7)=12, because

  • each partial word of length at least 12 with one hole and periods 5, 7 has also period 1=gcd(5,7),

  • the partial word ababaababa of length 11 is non-unary and has periods 5, 7.

As an intermediate step, we consider a dual holes function h=H(n,p,q), which gives the minimum number h of holes for which there is a partial word of length n with h holes and periods p,q which do not satisfy Lemma 1.2.

Example 1.4

We have H(11,5,7)=1 because

  • H(11,5,7)1: due to the classic periodicity lemma, every total word of length 11 with periods 5 and 7 has period 1=gcd(5,7), and

  • H(11,5,7)1: ababaababa is non-unary, has one hole and periods 5, 7.

We have H(12,5,7)H(11,5,7)+1=2 since appending ⋄ preserves periods. In fact H(12,5,7)==H(15,5,7)=2. However, there is no non-unary partial word of length 16 with 2 holes and periods 5, 7, so L(2,5,7)=16; see Table 1.

For a function f(n,p,q) monotone in n, we define its generalized inverse as:f˜(h,p,q)=min{n:f(n,p,q)>h}.

Observation 1.5

L=H˜.

As observed above, Lemma 1.2 becomes trivial if p divides q (noted as p|q). The case of p|2q is known to be special as well, but it has been fully described in [17]. Furthermore, it was shown in [19], [20] that the case of gcd(p,q)>1 is easily reducible to that of gcd(p,q)=1. We recall these existing results in Section 4, while in the other sections we assume that gcd(p,q)=1 and p,q>2.

Overview of the paper. In Section 2 we characterize Ls and its generalized inverse Hs, whereas in Section 3 we study Ld and its generalized inverse Hd. Our main combinatorial result, dubbed Characterization Theorem, is stated in Section 4; however, we defer its laborious proof to Section 9. It lets us design a slower, O(h+logp+logq)-time algorithm for computing L(h,p,q). This running time is improved to O(logp+logq) in Section 6, which is preceded by Section 5, where we recall the necessary number-theoretic tools related to continued fractions and Farey sequences. Finally, closed-form formulae for L and Ld are developed in Section 7, and Section 8 relates Ld to standard Sturmian words.

Section snippets

Functions Hs and Ls

For relatively prime integers p,q, 1<p<q, and an integer nq, let us defineHs(n,p,q)=nqp+nq+1p. We shall prove that H(n,p,q)Hs(n,p,q) for a suitable range of lengths n.

Fine and Wilf [3] constructed a word of length p+q2 with periods p and q and without period 1. For given p,q we choose such a total word Sp,q and define a partial word Wp,q as follows, setting k=q/p (see Fig. 1):Wp,q=(Sp,q[0..p3])kSp,q(Sp,q[q..q+p3])k. Note that |Wp,q|=2kp+|Sp,q|=(2k+1)p+q2.

Example 2.1

For p=5 and q=7, we

Functions Hd and Ld

In this section, we study a family of partial words corresponding to the special arrangements introduced in [19]. For relatively prime integers p,q>1, we say that a partial word S of length nmax(p,q) is (p,q)-special if it has a position l such that for each position i:S[i]={aif p(li) and q(li),bif pq|(li),otherwise. See Fig. 2 for an example. The (p,q)-special partial words can be characterized as follows.

Observation 3.1

Assume S is a binary partial word S over alphabet {a,b} with coprime periods p,q.

First algorithm for computing the length function L

Shur and Gamzova in [19] proved that H(n,p,q)=Hd(n,p,q) for n3q+p. We give a complete characterization of H in terms of Hd and Hs, and we derive an analogous characterization of L in terms of Ld and Ls. The tedious proof, based on a graph-theoretic approach similar to that in [20], is postponed to the last section.

Theorem 4.1 Characterization Theorem

Let p and q be relatively prime integers such that 2<p<q. For each integer np+q2, we haveH(n,p,q)={Hs(n,p,q)ifnq+pqp1or3qnq+3p1,Hd(n,p,q)otherwise. Moreover, for each integer

Number-theoretic tools

A more efficient algorithm for evaluating L relies on the theory of continued fractions; we refer to [25] and [26] for a self-contained yet compact introduction. A finite continued fraction is a sequence [γ0;γ1,,γm], where γ0,mZ0 and γiZ1 for 1im. We associate it with the following rational number:[γ0;γ1,,γm]=γ0+1γ1+1+1γm. Depending on the parity of m, we distinguish odd and even continued fractions. Often, an improper continued fraction [;]=10 is also introduced and assumed to be odd.

Faster algorithm for computing the length function L

The bottleneck of Algorithm 1 is evaluating the function Ld using Lemma 3.4(b). Thus, in this section we apply Fact 5.4 to develop a more efficient characterization of Ld. Let us start with an auxiliary lemma providing upper and lower bounds for the function Hd (whose generalized inverse is Ld).

Lemma 6.1

If nq, thennq+np2npqHd(n,p,q)nq+nqp+q1p1.

Proof

Recall that Hd(n,p,q)=minl=0n1(G(l,p,q)+G(nl1,p,q)) as stated in Lemma 3.4(a). The first part of the claim holds because for 0l<n we have

Closed-form formula for L(h,p,q)

In this section, we show how to compute a compact representation of the function L(h,,) in O(hlogh) time. We start with such representation for Ld, which follows from a combination of Lemma 6.2, Lemma 3.6.

Recall the basic points and middle points defined in Section 3.1. Here, we use (h+2)-basic points and (h+2)-middle points defined for 0<i<h+4 as respectivelyli=i1h+4i,mi=ih+4i. Note that for each i, we have liLefth+3(mi)miRighth+3(mi)li+1, but none of the inequalities is strict in

Relation to standard Sturmian words

For a finite sequence γ=(γ1,,γm) of positive integers, a Sturmian word St(γ) is recursively defined as Xm, where X1=Q, X0=P, and Xi=Xi1γiXi2 for 1im; see [28, Chapter 2]. The sequence γ is called the directive sequence of the Sturmian word St(γ). We classify directive sequences γ (and the Sturmian words St(γ)) into even and odd based on the parity of m.

Observation 8.1

Odd Sturmian words of length at least 2 end with PQ, while even Sturmian words of length at least 2 end with QP.

For a directive sequence γ=

Proof of the characterization theorem

Let us define the (n,p,q)-graph G=(V,E) as an undirected graph with vertices V={0,,n1}. The vertices i and j are connected if and only if p|(ji) or q|(ji). Observe that H(n,p,q) is the minimum size of a vertex separator in G, i.e., the minimum number of vertices to be removed from G so that the resulting graph is no longer connected; see Fig. 6.

We say that an edge (i,j) of the (n,p,q)-graph is a p-edge if p|(ji) and a q-edge if q|(ji). The set of all nodes giving the same remainder modulo

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (28)

  • T. Kociumaka et al.

    On periodicity lemma for partial words

  • M. Lothaire

    Combinatorics on Words

    (1997)
  • N.J. Fine et al.

    Uniqueness theorems for periodic functions

    Proc. Am. Math. Soc.

    (1965)
  • J. Justin

    On a paper by Castelli, Mignosi, Restivo, RAIRO

    Theor. Inform. Appl.

    (2000)
  • Cited by (0)

    Supported by the Polish National Science Center, grant no. 2014/13/B/ST6/00770. A preliminary version of this work appeared at LATA 2018 [1].

    View full text