Revisiting the parameterized complexity of Maximum-Duo Preservation String Mapping☆
Introduction
Computing distances between strings is a fundamental task in computer science. For many distance measures, the distance between two strings A and B is defined as the minimum number of local operations that are needed to transform A into B, for example the deletion or insertion of a character. For these measures, the distance between two strings A and B can be usually computed in polynomial time [13], [23]. In some applications, however, it is necessary to consider nonlocal operations that transform one string into the other. In comparative genomics, for example, genomes are modeled as strings with one character corresponding to a complete gene and one is interested in determining the evolutionary distance between two genomes. During biological evolution, genomes may be altered by large-scale mutations such as the reversal or the transposition of larger parts of the genome [19].
One approach to approximate the distance between two strings A and B with respect to many of these operations is to compute a smallest common string partition [11], [27]. Informally, a size-ℓ common string partition of two strings A and B is a partition of A and B, each into ℓ nonoverlapping substrings, such that the resulting two multisets of substrings of A and B are the same. The problem to compute a smallest common string partition, known as Minimum Common String Partition, is NP-hard [11], [22].
An alternative way of defining such a partition is to ask for a partition of A into ℓ nonoverlapping substrings such that permuting the order of these substrings and concatenating them subsequently gives the string B. This second view implies a mapping m that (bijectively) maps each position i of A to a position of B such that . The size of the common string partition is then exactly the number of pairs of consecutive positions i and (called duos) such that plus one since i is the end of one part and is the start of the next part. Therefore, computing a mapping m that maps only positions with the same characters to each other and maximizes the number k of consecutive positions for which directly yields a minimum common string partition of A and B. The problem of computing such a mapping is known as Maximum-Duo Preservation String Mapping (Max-Duo PSM). Since Max-Duo PSM is simply a dual of the Minimum Common String Partition problem, it is NP-hard as well. Motivated by this hardness, we study Max-Duo PSM from the viewpoint of parameterized algorithmics. More precisely, our aim is to obtain efficient algorithms when the parameter is k, the number of preserved duos. Before describing previous and our results, we give a formal problem definition.
Formal problem definition. Let A and B be two strings over a finite set of symbols Σ. Throughout this work, we assume that and that A and B are related, that is, B is a permutation of A. A mapping of A into B is a (bijective) function where for each ,1 . A duo in A is a pair of consecutive positions of A. We say that a mapping m preserves a duo if . Accordingly, the Max-Duo PSM problem is defined as follows.
Maximum-Duo Preservation String Mapping (Max-Duo PSM)
Input: Two related strings, A and B, and a nonnegative integer k.
Question: Does there exist a (bijective) mapping m of A into B such that the number of preserved duos is at least k?
Previous work. Initially, Max-Duo PSM has been proposed as an alternative possibility of achieving approximation algorithms for Minimum Common String Partition (MCSP) [10], because the best known polynomial-time approximation algorithm has an approximation factor of [12]. Consequently, most work on Max-Duo PSM focuses on approximation algorithms with the first constant-factor approximation algorithm achieving an approximation factor of 4 [6]. This was subsequently improved to a factor of 3.5 [5] and then to a factor of 3.25 [7]. Recently further progress concerning the approximation factor has been reported [18], [28].
Beretta et al. [2], [1] initiated the study of Max-Duo PSM from the viewpoint of parameterized algorithmics. They studied both the fixed-parameter tractability and the kernelization complexity of Max-Duo PSM, showing that this problem can be solved in time, and that it admits a kernel of size . Thus, Beretta et al. [2], [1] were the first to show that Max-Duo PSM is FPT and that it admits a polynomial kernel. The fixed-parameter algorithm of Beretta et al. [2], [1] is based on a combination of color coding and dynamic programming.
In comparison with Max-Duo PSM, MCSP has been investigated more thoroughly from the viewpoint of parameterized algorithms. Damaschke [15] presented the first fixed-parameter algorithms for MCSP, for combined parameters such as “partition size ℓ plus repetition number of the input strings”.2 Subsequently, MCSP was shown to be fixed-parameter tractable with the single parameter partition size ℓ [9]. Jiang et al. [24] considered the combined parameter “partition size ℓ plus maximum occurrence d of any character” and showed that MCSP can be solved in time. Subsequently, this running time was improved to [8].
Our contribution. We make two main contributions. First, we develop two algorithms for the Max-Duo PSM problem that are substantially faster than the (deterministic) algorithm by Beretta et al. [2], [1], which runs in time. Specifically, we develop a randomized algorithm that solves Max-Duo PSM in time, as well as a deterministic algorithm that solves this problem in time. Here, in the context of our randomized algorithm, we mean that if we determine that the input is a yes-instance, then this answer is necessarily correct, and if we determine that the input is a no-instance, then this answer is correct with probability at least 9/10.3 For the purpose of developing our algorithms, we present a reduction from Max-Duo PSM to a problem of finding paths in an edge-colored graph, which might be of independent interest. This reduction lies at the heart of our algorithms, since by employing advanced tools from the field of parameterized algorithmics, namely, the methods of narrow sieves [4], [3] and representative sets [20], it is possible to quickly solve the resulting graph problem.
Second, we prove that Max-Duo PSM admits a kernel of size , improving upon the kernel of size by Beretta et al. [2].
Preliminaries. We use to denote the set of natural numbers between i and j. Moreover, given a string A, we denote the substring starting at position i and ending at position j by . For a (directed) graph G, let denote the vertex set of G and the edge set of G.
The field of parameterized algorithmics studies parameterized problems, where each problem instance is associated with a parameter k, usually a nonnegative integer. Given a parameterized problem, the first question is whether the problem is fixed-parameter tractable (FPT), that is, whether it can be solved in time, where f is an arbitrary function that depends only on k and is the size of the input instance. In other words, the notion of FPT signifies that the combinatorial explosion can be confined to the parameter k. A second question is whether the problem also admits a polynomial kernelization. Here, a problem Π is said to admit a polynomial kernelization if there exists a polynomial-time algorithm that, given an instance of Π, outputs an equivalent instance of Π, called a kernel, where and ; kernelization is a mathematical concept that aims to analyze preprocessing procedures in a formal, rigorous manner. For further details, refer to [17], [14], [21].
Section snippets
Reduction to a path finding problem
In this section, we present a reduction from Max-Duo PSM to the following graph problem.
Substantially Blue Path
Input: A directed acyclic graph (DAG) G, an edge-coloring , a vertex-labeling , and nonnegative integers k and r.
Question: Does G contain a directed path P such that
- •
,
- •
for all , , and
- •
.
Construction. Let be an instance of Max-Duo PSM. We construct an instance of Substantially Blue Path as follows (here,
A randomized algorithm based on narrow sieves
In this section, we adapt the method of narrow sieves that was applied to solve the k-Path problem [4] to solve Substantially Blue Path. More precisely, our objective is to provide a constructive proof for the following result.
Lemma 4 There exists a randomized algorithm that solves Substantially Blue Path in time and polynomial space.
In light of Lemma 3, once we have Lemma 4 at hand, we immediately obtain the following theorem.
Theorem 1 There exists a randomized algorithm that solves Max-Duo PSM
Deterministic algorithm: representative sets
In this section, we adapt the approach in which the method of representative sets is applied to solve the k-Path problem [20]. More precisely, our objective is to provide a constructive proof for the following result.
Lemma 10 There exists a deterministic algorithm that solves Substantially Blue Path in time.
In light of Lemma 3, once we have Lemma 10 at hand, we directly obtain the following theorem.
Theorem 2 There exists a deterministic algorithm that solves Max-Duo PSM in
A cubic problem kernel
In this section we will show that Max-Duo PSM admits a kernel of size . Let be an instance of Max-Duo PSM, and let . If , then we let . Analogously, if , then we let .
Let m be a map of S into , and let D be a set of duos. We denote by the image of D under m. We say that m preserves D if m preserves each duo in D. Let and be sets of duos. We say that the pair is complete for if whenever there is a map m of A
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (29)
- et al.
Corrigendum to “Parameterized tractability of the maximum-duo preservation string mapping problem” [Theoret. Comput. Sci. 646 (2016) 16–25]
Theor. Comput. Sci.
(2016) - et al.
Parameterized tractability of the maximum-duo preservation string mapping problem
Theor. Comput. Sci.
(2016) - et al.
Narrow sieves for parameterized paths and packings
J. Comput. Syst. Sci.
(2017) - et al.
Solving the maximum duo-preservation string mapping problem with linear programming
Theor. Comput. Sci.
(2014) - et al.
A probabilistic remark on algebraic program testing
Inf. Process. Lett.
(1978) - et al.
Representative families: a unified tradeoff-based approach
J. Comput. Syst. Sci.
(2016) Determinant sums for undirected hamiltonicity
SIAM J. Comput.
(2014)- et al.
A 7/2-approximation algorithm for the maximum duo-preservation string mapping problem
- et al.
Improved approximation for the maximum duo-preservation string mapping problem
Further improvement in approximating the maximum duo-preservation string mapping problem
A fixed-parameter algorithm for minimum common string partition with few duplications
Minimum common string partition parameterized by partition size is fixed-parameter tractable
Assignment of orthologous genes via genome rearrangement
IEEE/ACM Trans. Comput. Biol. Bioinform.
The string edit distance matching problem with moves
ACM Trans. Algorithms
Cited by (2)
The edge-preservation similarity for comparing rooted, unordered, node-labeled trees
2023, Pattern Recognition LettersThe maximum duo-preservation string mapping problem with bounded alphabet
2021, Leibniz International Proceedings in Informatics, LIPIcs
- ☆
A preliminary version of this paper appeared in the proceedings of CPM 2017.