Elsevier

Theoretical Computer Science

Volume 847, 22 December 2020, Pages 27-38
Theoretical Computer Science

Revisiting the parameterized complexity of Maximum-Duo Preservation String Mapping

https://doi.org/10.1016/j.tcs.2020.09.034Get rights and content

Abstract

In the Maximum-Duo Preservation String Mapping (Max-Duo PSM) problem, the input consists of two related strings A and B of length n and a nonnegative integer k. The objective is to determine whether there exists a mapping m from the set of positions of A to the set of positions of B that maps only to positions with the same character and preserves at least k duos, which are pairs of adjacent positions. We develop a randomized algorithm that solves Max-Duo PSM in 4knO(1) time, and a deterministic algorithm that solves this problem in 6.855knO(1) time. The previous best known (deterministic) algorithm for this problem has (8e)2k+o(k)nO(1) running time [Beretta et al. (2016) [1], [2]]. We also show that Max-Duo PSM admits a problem kernel of size O(k3), improving upon the previous best known problem kernel of size O(k6).

Introduction

Computing distances between strings is a fundamental task in computer science. For many distance measures, the distance between two strings A and B is defined as the minimum number of local operations that are needed to transform A into B, for example the deletion or insertion of a character. For these measures, the distance between two strings A and B can be usually computed in polynomial time [13], [23]. In some applications, however, it is necessary to consider nonlocal operations that transform one string into the other. In comparative genomics, for example, genomes are modeled as strings with one character corresponding to a complete gene and one is interested in determining the evolutionary distance between two genomes. During biological evolution, genomes may be altered by large-scale mutations such as the reversal or the transposition of larger parts of the genome [19].

One approach to approximate the distance between two strings A and B with respect to many of these operations is to compute a smallest common string partition [11], [27]. Informally, a size- common string partition of two strings A and B is a partition of A and B, each into nonoverlapping substrings, such that the resulting two multisets of substrings of A and B are the same. The problem to compute a smallest common string partition, known as Minimum Common String Partition, is NP-hard [11], [22].

An alternative way of defining such a partition is to ask for a partition of A into nonoverlapping substrings such that permuting the order of these substrings and concatenating them subsequently gives the string B. This second view implies a mapping m that (bijectively) maps each position i of A to a position m(i) of B such that A[i]=B[m(i)]. The size of the common string partition is then exactly the number of pairs of consecutive positions i and i+1 (called duos) such that m(i)+1m(i+1) plus one since i is the end of one part and i+1 is the start of the next part. Therefore, computing a mapping m that maps only positions with the same characters to each other and maximizes the number k of consecutive positions for which m(i)+1=m(i+1) directly yields a minimum common string partition of A and B. The problem of computing such a mapping is known as Maximum-Duo Preservation String Mapping (Max-Duo PSM). Since Max-Duo PSM is simply a dual of the Minimum Common String Partition problem, it is NP-hard as well. Motivated by this hardness, we study Max-Duo PSM from the viewpoint of parameterized algorithmics. More precisely, our aim is to obtain efficient algorithms when the parameter is k, the number of preserved duos. Before describing previous and our results, we give a formal problem definition.

Formal problem definition.  Let A and B be two strings over a finite set of symbols Σ. Throughout this work, we assume that |A|=|B|=n and that A and B are related, that is, B is a permutation of A. A mapping of A into B is a (bijective) function m:[n][n] where for each i[n],1 A[i]=B[m(i)]. A duo in A is a pair of consecutive positions (i,i+1) of A. We say that a mapping m preserves a duo (i,i+1) if m(i)+1=m(i+1). Accordingly, the Max-Duo PSM problem is defined as follows.

Maximum-Duo Preservation String Mapping (Max-Duo PSM)

Input: Two related strings, A and B, and a nonnegative integer k.

Question: Does there exist a (bijective) mapping m of A into B such that the number of preserved duos is at least k?

Previous work.  Initially, Max-Duo PSM has been proposed as an alternative possibility of achieving approximation algorithms for Minimum Common String Partition (MCSP) [10], because the best known polynomial-time approximation algorithm has an approximation factor of O(lognlogn) [12]. Consequently, most work on Max-Duo PSM focuses on approximation algorithms with the first constant-factor approximation algorithm achieving an approximation factor of 4 [6]. This was subsequently improved to a factor of 3.5 [5] and then to a factor of 3.25 [7]. Recently further progress concerning the approximation factor has been reported [18], [28].

Beretta et al. [2], [1] initiated the study of Max-Duo PSM from the viewpoint of parameterized algorithmics. They studied both the fixed-parameter tractability and the kernelization complexity of Max-Duo PSM, showing that this problem can be solved in (8e)2k+o(k)nO(1) time, and that it admits a kernel of size O(k6). Thus, Beretta et al. [2], [1] were the first to show that Max-Duo PSM is FPT and that it admits a polynomial kernel. The fixed-parameter algorithm of Beretta et al. [2], [1] is based on a combination of color coding and dynamic programming.

In comparison with Max-Duo PSM, MCSP has been investigated more thoroughly from the viewpoint of parameterized algorithms. Damaschke [15] presented the first fixed-parameter algorithms for MCSP, for combined parameters such as “partition size plus repetition number of the input strings”.2 Subsequently, MCSP was shown to be fixed-parameter tractable with the single parameter partition size [9]. Jiang et al. [24] considered the combined parameter “partition size plus maximum occurrence d of any character” and showed that MCSP can be solved in (d!)knO(1) time. Subsequently, this running time was improved to O(d2kkn) [8].

Our contribution.  We make two main contributions. First, we develop two algorithms for the Max-Duo PSM problem that are substantially faster than the (deterministic) algorithm by Beretta et al. [2], [1], which runs in (8e)2k+o(k)nO(1) time. Specifically, we develop a randomized algorithm that solves Max-Duo PSM in 4knO(1) time, as well as a deterministic algorithm that solves this problem in 6.855knO(1) time. Here, in the context of our randomized algorithm, we mean that if we determine that the input is a yes-instance, then this answer is necessarily correct, and if we determine that the input is a no-instance, then this answer is correct with probability at least 9/10.3 For the purpose of developing our algorithms, we present a reduction from Max-Duo PSM to a problem of finding paths in an edge-colored graph, which might be of independent interest. This reduction lies at the heart of our algorithms, since by employing advanced tools from the field of parameterized algorithmics, namely, the methods of narrow sieves [4], [3] and representative sets [20], it is possible to quickly solve the resulting graph problem.

Second, we prove that Max-Duo PSM admits a kernel of size O(k3), improving upon the kernel of size O(k6) by Beretta et al. [2].

Preliminaries.  We use [i,j] to denote the set {i,i+1,,j} of natural numbers between i and j. Moreover, given a string A, we denote the substring starting at position i and ending at position j by A[i,j]. For a (directed) graph G, let V(G) denote the vertex set of G and E(G) the edge set of G.

The field of parameterized algorithmics studies parameterized problems, where each problem instance is associated with a parameter k, usually a nonnegative integer. Given a parameterized problem, the first question is whether the problem is fixed-parameter tractable (FPT), that is, whether it can be solved in f(k)|X|O(1) time, where f is an arbitrary function that depends only on k and |X| is the size of the input instance. In other words, the notion of FPT signifies that the combinatorial explosion can be confined to the parameter k. A second question is whether the problem also admits a polynomial kernelization. Here, a problem Π is said to admit a polynomial kernelization if there exists a polynomial-time algorithm that, given an instance (X,k) of Π, outputs an equivalent instance (Xˆ,kˆ) of Π, called a kernel, where |Xˆ|=kˆO(1) and kˆk; kernelization is a mathematical concept that aims to analyze preprocessing procedures in a formal, rigorous manner. For further details, refer to [17], [14], [21].

Section snippets

Reduction to a path finding problem

In this section, we present a reduction from Max-Duo PSM to the following graph problem.

Substantially Blue Path

Input: A directed acyclic graph (DAG) G, an edge-coloring c:E(G){R,B}, a vertex-labeling :V(G)N, and nonnegative integers k and r.

Question: Does G contain a directed path P such that

  • |V(P)|r,

  • for all u,vV(P), (u)(v), and

  • |{eE(P):c(e)=B}|k.

Construction.  Let (A,B,k) be an instance of Max-Duo PSM. We construct an instance (G,c,,k,r) of Substantially Blue Path as follows (here,

A randomized algorithm based on narrow sieves

In this section, we adapt the method of narrow sieves that was applied to solve the k-Path problem [4] to solve Substantially Blue Path. More precisely, our objective is to provide a constructive proof for the following result.

Lemma 4

There exists a randomized algorithm that solves Substantially Blue Path in 2rrO(1)|E(G)| time and polynomial space.

In light of Lemma 3, once we have Lemma 4 at hand, we immediately obtain the following theorem.

Theorem 1

There exists a randomized algorithm that solves Max-Duo PSM

Deterministic algorithm: representative sets

In this section, we adapt the approach in which the method of representative sets is applied to solve the k-Path problem [20]. More precisely, our objective is to provide a constructive proof for the following result.

Lemma 10

There exists a deterministic algorithm that solves Substantially Blue Path in O((1+52)r+o(r)|E(G)|log|E(G)|) time.

In light of Lemma 3, once we have Lemma 10 at hand, we directly obtain the following theorem.

Theorem 2

There exists a deterministic algorithm that solves Max-Duo PSM in O((1+52)

A cubic problem kernel

In this section we will show that Max-Duo PSM admits a kernel of size O(k3). Let (A,B,k) be an instance of Max-Duo PSM, and let S{A,B}. If S=A, then we let S=B. Analogously, if S=B, then we let S=A.

Let m be a map of S into S, and let D be a set of duos. We denote by m(D)={(m(i),m(i+1))|(i,i+1)D} the image of D under m. We say that m preserves D if m preserves each duo in D. Let CA and CB be sets of duos. We say that the pair (CA,CB) is complete for (A,B,k) if whenever there is a map m of A

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (29)

  • L. Bulteau et al.

    A fixed-parameter algorithm for minimum common string partition with few duplications

  • L. Bulteau et al.

    Minimum common string partition parameterized by partition size is fixed-parameter tractable

  • X. Chen et al.

    Assignment of orthologous genes via genome rearrangement

    IEEE/ACM Trans. Comput. Biol. Bioinform.

    (2005)
  • G. Cormode et al.

    The string edit distance matching problem with moves

    ACM Trans. Algorithms

    (2007)
  • Cited by (2)

    • The maximum duo-preservation string mapping problem with bounded alphabet

      2021, Leibniz International Proceedings in Informatics, LIPIcs

    A preliminary version of this paper appeared in the proceedings of CPM 2017.

    View full text