• arXiv.cs.DS Pub Date : 2020-01-16
Kohei Yamada; Yuto Nakashima; Shunsuke Inenaga; Hideo Bannai; Masayuki Takeda

The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings $A$ and $B$. More recently, a set of four constrained LCS problems (called generalized constrained LCS problem) were proposed by Chen and Chao [J. Comb. Optim, 2011]. In this paper, we consider the substring-excluding constrained LCS (STR-EC-LCS) problem. A string $Z$ is said to be an STR-EC-LCS of two given strings $A$ and $B$ excluding $P$ if, $Z$ is one of the longest common subsequences of $A$ and $B$ that does not contain $P$ as a substring. Wang et al. proposed a dynamic programming solution which computes an STR-EC-LCS in $O(mnr)$ time and space where $m = |A|, n = |B|, r = |P|$ [Inf. Process. Lett., 2013]. In this paper, we show a new solution for the STR-EC-LCS problem. Our algorithm computes an STR-EC-LCS in $O(n|\Sigma| + (L+1)(m-L+1)r)$ time where $|\Sigma| \leq \min\{m, n\}$ denotes the set of distinct characters occurring in both $A$ and $B$, and $L$ is the length of the STR-EC-LCS. This algorithm is faster than the $O(mnr)$-time algorithm for short/long STR-EC-LCS (namely, $L \in O(1)$ or $m-L \in O(1)$), and is at least as efficient as the $O(mnr)$-time algorithm for all cases.

更新日期：2020-01-17
• arXiv.cs.DS Pub Date : 2020-01-16
Marc Hellmuth; Carsten R. Seemann; Peter F. Stadler

Binary relations derived from labeled rooted trees play an import role in mathematical biology as formal models of evolutionary relationships. The (symmetrized) Fitch relation formalizes xenology as the pairs of genes separated by at least one horizontal transfer event. As a natural generalization, we consider symmetrized Fitch maps, that is, symmetric maps $\varepsilon$ that assign a subset of colors to each pair of vertices in $X$ and that can be explained by a tree $T$ with edges that are labeled with subsets of colors in the sense that the color $m$ appears in $\varepsilon(x,y)$ if and only if $m$ appears in a label along the unique path between $x$ and $y$ in $T$. We first give an alternative characterization of the monochromatic case and then give a characterization of symmetrized Fitch maps in terms of compatibility of a certain set of quartets. We show that recognition of symmetrized Fitch maps is NP-complete but FPT in general. In the restricted case where $|\varepsilon(x,y)|\leq 1$ the problem becomes polynomial, since such maps coincide with class of monochromatic Fitch maps whose graph-representations form precisely the class of complete multi-partite graphs.

更新日期：2020-01-17
• arXiv.cs.DS Pub Date : 2020-01-16
Bartłomiej Dudek; Paweł Gawrychowski; Tatiana Starikovskaya

In the problem of $\texttt{Generalised Pattern Matching}\ (\texttt{GPM})$ [STOC'94, Muthukrishnan and Palem], we are given a text $T$ of length $n$ over an alphabet $\Sigma_T$, a pattern $P$ of length $m$ over an alphabet $\Sigma_P$, and a matching relationship $\subseteq \Sigma_T \times \Sigma_P$, and must return all substrings of $T$ that match $P$ (reporting) or the number of mismatches between each substring of $T$ of length $m$ and $P$ (counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: * $\mathcal{D}\,$ being the maximum number of characters that match a fixed character, * $\mathcal{S}\,$ being the number of pairs of matching characters, * $\mathcal{I}\,$ being the total number of disjoint intervals of characters that match the $m$ characters of the pattern $P$. At the heart of our new deterministic upper bounds for $\mathcal{D}\,$ and $\mathcal{S}\,$ lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for $\texttt{GPM}$. We start by showing that any deterministic or Monte Carlo algorithm for $\texttt{GPM}$ must use $\Omega(\mathcal{S})$ time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.

更新日期：2020-01-17
• arXiv.cs.DS Pub Date : 2016-11-04
Lucas Boczkowski; Uriel Feige; Amos Korman; Yoav Rodeh

We consider a search problem on trees in which the goal is to find an adversarially placed treasure, while relying on local, partial information. Specifically, each node in the tree holds a pointer to one of its neighbors, termed \emph{advice}. A node is faulty with probability $q$. The advice at a non-faulty node points to the neighbor that is closer to the treasure, and the advice at a faulty node points to a uniformly random neighbor. Crucially, the advice is {\em permanent}, in the sense that querying the same node again would yield the same answer. Let $\Delta$ denote the maximal degree. Roughly speaking, when considering the expected number of {\em moves}, i.e., edge traversals, we show that a phase transition occurs when the {\em noise parameter} $q$ is about $1/\sqrt{\Delta}$. Below the threshold, there exists an algorithm with expected move complexity $O(D\sqrt{\Delta})$, where $D$ is the depth of the treasure, whereas above the threshold, every search algorithm has expected number of moves which is both exponential in $D$ and polynomial in the number of nodes~$n$. In contrast, if we require to find the treasure with probability at least $1-\delta$, then for every fixed $\varepsilon > 0$, if $q<1/\Delta^{\varepsilon}$ then there exists a search strategy that with probability $1-\delta$ finds the treasure using $(\delta^{-1}D)^{O(\frac 1 \varepsilon)}$ moves. Moreover, we show that $(\delta^{-1}D)^{\Omega(\frac 1 \varepsilon)}$ moves are necessary. Besides the number of moves, we also study the number of advice {\em queries} required to find the treasure. Roughly speaking, for this complexity, we show similar threshold results to those previously stated, where the parameter $D$ is replaced by $\log n$.

更新日期：2020-01-17
• arXiv.cs.DS Pub Date : 2018-12-27
Jian Lin; Zhong Yuan Lai; Xiaopeng Li

Quantum algorithm design lies in the hallmark of applications of quantum computation and quantum simulation. Here we put forward a deep reinforcement learning (RL) architecture for automated algorithm design in the framework of quantum adiabatic algorithm, where the optimal Hamiltonian path to reach a quantum ground state that encodes a compution problem is obtained by RL techniques. We benchmark our approach in Grover search and 3-SAT problems, and find that the adiabatic algorithm obtained by our RL approach leads to significant improvement in the success probability and computing speedups for both moderate and large number of qubits compared to conventional algorithms. The RL-designed algorithm is found to be qualitatively distinct from the linear algorithm in the resultant distribution of success probability. Considering the established complexity-equivalence of circuit and adiabatic quantum algorithms, we expect the RL-designed adiabatic algorithm to inspire novel circuit algorithms as well. Our approach offers a recipe to design quantum algorithms for generic problems through a machinery RL process, which paves a novel way to automated quantum algorithm design using artificial intelligence, potentially applicable to different quantum simulation and computation platforms from trapped ions and optical lattices to superconducting-qubit devices.

更新日期：2020-01-17
• arXiv.cs.DS Pub Date : 2019-03-12
Maria Chudnovsky; Marcin Pilipczuk; Michał Pilipczuk; Stéphan Thomassé

A hole in a graph is an induced cycle of length at least $4$, and an antihole is the complement of an induced cycle of length at least $4$. A hole or antihole is long if its length is at least $5$. For an integer $k$, the $k$-prism is the graph consisting of two cliques of size $k$ joined by a matching. The complexity of Maximum (Weight) Independent Set (MWIS) in long-hole-free graphs remains an important open problem. In this paper we give a polynomial time algorithm to solve MWIS in long-hole-free graphs with no $k$-prism (for any fixed integer $k$), and a subexponential algorithm for MWIS in long-hole-free graphs in general. As a special case this gives a polynomial time algorithm to find a maximum weight clique in perfect graphs with no long antihole, and no hole of length $6$. The algorithms use the framework of minimal chordal completions and potential maximal cliques.

更新日期：2020-01-17
• arXiv.cs.DS Pub Date : 2019-04-29
Thomas Bläsius; Philipp Fischbeck; Tobias Friedrich; Maximilian Katzmann

The VertexCover problem is proven to be computationally hard in different ways: It is NP-complete to find an optimal solution and even NP-hard to find an approximation with reasonable factors. In contrast, recent experiments suggest that on many real-world networks the run time to solve VertexCover is way smaller than even the best known FPT-approaches can explain. Similarly, greedy algorithms deliver very good approximations to the optimal solution in practice. We link these observations to two properties that are observed in many real-world networks, namely a heterogeneous degree distribution and high clustering. To formalize these properties and explain the observed behavior, we analyze how a branch-and-reduce algorithm performs on hyperbolic random graphs, which have become increasingly popular for modeling real-world networks. In fact, we are able to show that the VertexCover problem on hyperbolic random graphs can be solved in polynomial time, with high probability. The proof relies on interesting structural properties of hyperbolic random graphs. Since these predictions of the model are interesting in their own right, we conducted experiments on real-world networks showing that these properties are also observed in practice. When utilizing the same structural properties in an adaptive greedy algorithm, further experiments suggest that, on real instances, this leads to better approximations than the standard greedy approach within reasonable time.

更新日期：2020-01-17
• arXiv.cs.DS Pub Date : 2019-05-09
Graham Cormode; Pavel Veselý

Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles, up to an error of at most $\varepsilon$. That is, an $\varepsilon$-approximate quantile summary first processes a stream of items and then, given any quantile query $0\le \phi\le 1$, returns an item from the stream, which is a $\phi'$-quantile for some $\phi' = \phi \pm \varepsilon$. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna (SIGMOD '01), stores at most $O(\frac{1}{\varepsilon}\cdot \log \varepsilon N)$ items, where $N$ is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space $f(\varepsilon)\cdot o(\log N)$, for any function $f$ that does not depend on $N$. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of $(1\pm \varepsilon)\cdot \phi$, and for other related computational tasks.

更新日期：2020-01-17
• arXiv.cs.DS Pub Date : 2019-11-16
Andreas Galanis; Leslie Ann Goldberg; Heng Guo; Kuan Yang

We give the first efficient algorithm to approximately count the number of solutions in the random $k$-SAT model when the density of the formula scales exponentially with $k$. The best previous counting algorithm was due to Montanari and Shah and was based on the correlation decay method, which works up to densities $(1+o_k(1))\frac{2\log k}{k}$, the Gibbs uniqueness threshold for the model. Instead, our algorithm harnesses a recent technique by Moitra to work for random formulas. The main challenge in our setting is to account for the presence of high-degree variables whose marginal distributions are hard to control and which cause significant correlations within the formula.

更新日期：2020-01-17
• arXiv.cs.DS Pub Date : 2020-01-14

We give a 1.488-approximation for the classic scheduling problem of minimizing total weighted completion time on unrelated machines. This is a considerable improvement on the recent breakthrough of $(1.5 - 10^{-7})$-approximation (STOC 2016, Bansal-Srinivasan-Svensson) and the follow-up result of $(1.5 - 1/6000)$-approximation (FOCS 2017, Li). Bansal et al. introduced a novel rounding scheme yielding strong negative correlations for the first time and applied it to the scheduling problem to obtain their breakthrough, which resolved the open problem if one can beat out the long-standing $1.5$-approximation barrier based on independent rounding. Our key technical contribution is in achieving significantly stronger negative correlations via iterative fair contention resolution, which is of independent interest. Previously, Bansal et al. obtained strong negative correlations via a variant of pipage type rounding and Li used it as a black box.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2020-01-14
Giuseppe Persiano; Kevin Yeo

In this paper, we study the static cell probe complexity of non-adaptive data structures that maintain a subset of $n$ points from a universe consisting of $m=n^{1+\Omega(1)}$ points. A data structure is defined to be non-adaptive when the memory locations that are chosen to be accessed during a query depend only on the query inputs and not on the contents of memory. We prove an $\Omega(\log m / \log (sw/n\log m))$ static cell probe complexity lower bound for non-adaptive data structures that solve the fundamental dictionary problem where $s$ denotes the space of the data structure in the number of cells and $w$ is the cell size in bits. Our lower bounds hold for all word sizes including the bit probe model ($w = 1$) and are matched by the upper bounds of Boninger et al. [FSTTCS'17]. Our results imply a sharp dichotomy between dictionary data structures with one round of adaptive and at least two rounds of adaptivity. We show that $O(1)$, or $O(\log^{1-\epsilon}(m))$, overhead dictionary constructions are only achievable with at least two rounds of adaptivity. In particular, we show that many $O(1)$ dictionary constructions with two rounds of adaptivity such as cuckoo hashing are optimal in terms of adaptivity. On the other hand, non-adaptive dictionaries must use significantly more overhead. Finally, our results also imply static lower bounds for the non-adaptive predecessor problem. Our static lower bounds peak higher than the previous, best known lower bounds of $\Omega(\log m / \log w)$ for the dynamic predecessor problem by Boninger et al. [FSTTCS'17] and Ramamoorthy and Rao [CCC'18] in the natural setting of linear space $s = \Theta(n)$ where each point can fit in a single cell $w = \Theta(\log m)$. Furthermore, our results are stronger as they apply to the static setting unlike the previous lower bounds that only applied in the dynamic setting.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2020-01-15
Jiaqi Dong; Runyu Zhang; Chaoshu Yang; Yujuan Tan; Duo Liu

Frequent-pattern mining is a common approach to reveal the valuable hidden trends behind data. However, existing frequent-pattern mining algorithms are designed for DRAM, instead of persistent memories (PMs), which can lead to severe performance and energy overhead due to the utterly different characteristics between DRAM and PMs when they are running on PMs. In this paper, we propose an efficient and Wear-leveling-aware Frequent-Pattern Mining scheme, WFPM, to solve this problem. The proposed WFPM is evaluated by a series of experiments based on realistic datasets from diversified application scenarios, where WFPM achieves 32.0% performance improvement and prolongs the NVM lifetime of header table by 7.4x over the EvFP-Tree.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2020-01-15
Felix Reidl; Blair D. Sullivan

We present an algorithm to count the number of occurrences of a pattern graph $H$ as an induced subgraph in a host graph $G$. If $G$ belongs to a bounded expansion class, the algorithm runs in linear time. Our design choices are motivated by the need for an approach that can be engineered into a practical implementation for sparse host graphs. Specifically, we introduce a decomposition of the pattern $H$ called a counting dag $\vec C(H)$ which encodes an order-aware, inclusion-exclusion counting method for $H$. Given such a counting dag and a suitable linear ordering $\mathbb G$ of $G$ as input, our algorithm can count the number of times $H$ appears as an induced subgraph in $G$ in time $O(\|\vec C\| \cdot h \text{wcol}_{h}(\mathbb G)^{h-1} |G|)$, where $\text{wcol}_h(\mathbb G)$ denotes the maximum size of the weakly $h$-reachable sets in $\mathbb G$. This implies, combined with previous results, an algorithm with running time $O(4^{h^2}h (\text{wcol}_h(G)+1)^{h^3} |G|)$ which only takes $H$ and $G$ as input. We note that with a small modification, our algorithm can instead use strongly $h$-reachable sets with running time $O(\|\vec C\| \cdot h \text{col}_{h}(\mathbb G)^{h-1} |G|)$, resulting in an overall complexity of $O(4^{h^2}h \text{col}_h(G)^{h^2} |G|)$ when only given $H$ and $G$. Because orderings with small weakly/strongly reachable sets can be computed relatively efficiently in practice [11], our algorithm provides a promising alternative to algorithms using the traditional $p$-treedepth colouring framework [13]. We describe preliminary experimental results from an initial open source implementation which highlight its potential.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2020-01-15
Dmitry Kosolobov; Oleg Merkurev

A skeleton Huffman tree is a Huffman tree in which all disjoint maximal perfect subtrees are shrunk into leaves. Skeleton Huffman trees, besides saving storage space, are also used for faster decoding and for speeding up Huffman-shaped wavelet trees. In 2017 Klein et al. introduced an optimal skeleton tree: for given symbol frequencies, it has the least number of nodes among all optimal prefix-free code trees (not necessarily Huffman's) with shrunk perfect subtrees. Klein et al. described a simple algorithm that, for fixed codeword lengths, finds a skeleton tree with the least number of nodes; with this algorithm one can process each set of optimal codeword lengths to find an optimal skeleton tree. However, there are exponentially many such sets in the worst case. We describe an $O(n^2\log n)$-time algorithm that, given $n$ symbol frequencies, constructs an optimal skeleton tree and its corresponding optimal code.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2020-01-15
Guy Steele; Sebastiano Vigna

Congruential pseudorandom number generators rely on good multipliers, that is, integers that have good performance with respect to the spectral test. We provide lists of multipliers with a good lattice structure up to dimension eight for generators with typical power-of-two moduli, analyzing in detail multipliers close to the square root of the modulus, whose product can be computed quickly.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2020-01-15
Tyler Helmuth; Will Perkins; Samantha Petti

We improve upon all known lower bounds on the critical fugacity and critical density of the hard sphere model in dimensions two and higher. As the dimension tends to infinity our improvements are by factors of $2$ and $1.7$, respectively. We make these improvements by utilizing techniques from theoretical computer science to show that a certain Markov chain for sampling from the hard sphere model mixes rapidly at low enough fugacities. We then prove an equivalence between optimal spatial and temporal mixing for hard spheres, an equivalence that is well-known for a wide class of discrete spin systems.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2020-01-15
Falko Hegerfeld; Stefan Kratsch

A breakthrough result of Cygan et al. (FOCS 2011) showed that connectivity problems parameterized by treewidth can be solved much faster than the previously best known time $\mathcal{O}^*(2^{\mathcal{O}(tw \log(tw))})$. Using their inspired Cut\&Count technique, they obtained $\mathcal{O}^*(\alpha^{tw})$ time algorithms for many such problems. Moreover, they proved these running times to be optimal assuming the Strong Exponential-Time Hypothesis. Unfortunately, like other dynamic programming algorithms on tree decompositions, these algorithms also require exponential space, and this is widely believed to be unavoidable. In contrast, for the slightly larger parameter called treedepth, there are already several examples of matching the time bounds obtained for treewidth, but using only polynomial space. Nevertheless, this has remained open for connectivity problems. In the present work, we close this knowledge gap by applying the Cut\&Count technique to graphs of small treedepth. While the general idea is unchanged, we have to design novel procedures for counting consistently cut solution candidates using only polynomial space. Concretely, we obtain time $\mathcal{O}^*(3^d)$ and polynomial space for Connected Vertex Cover, Feedback Vertex Set, and Steiner Tree on graphs of treedepth $d$. Similarly, we obtain time $\mathcal{O}^*(4^d)$ and polynomial space for Connected Dominating Set and Connected Odd Cycle Transversal.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2020-01-15
Yujie Wang

The self-improving sorter proposed by Ailon et al. consists of two phases: a relatively long training phase and rapid operation phase. In this study, we have developed an efficient way to further improve this sorter by approximating its training phase to be faster but not sacrificing much performance in the operation phase. It is very necessary to ensure the accuracy of the estimated entropy when we test the performance of this approximated sorter. Thus we further developed a useful formula to calculate an upper bound for the 'error' of the estimated entropy derived from the input data with unknown distributions. Our work will contribute to the better use of this self-improving sorter for huge data in a quicker way.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2019-04-06
Nikolaj Tatti

Decomposing a graph into a hierarchical structure via $k$-core analysis is a standard operation in any modern graph-mining toolkit. $k$-core decomposition is a simple and efficient method that allows to analyze a graph beyond its mere degree distribution. More specifically, it is used to identify areas in the graph of increasing centrality and connectedness, and it allows to reveal the structural organization of the graph. Despite the fact that $k$-core analysis relies on vertex degrees, $k$-cores do not satisfy a certain, rather natural, density property. Simply put, the most central $k$-core is not necessarily the densest subgraph. This inconsistency between $k$-cores and graph density provides the basis of our study. We start by defining what it means for a subgraph to be locally-dense, and we show that our definition entails a nested chain decomposition of the graph, similar to the one given by $k$-cores, but in this case the components are arranged in order of increasing density. We show that such a locally-dense decomposition for a graph $G=(V,E)$ can be computed in polynomial time. The running time of the exact decomposition algorithm is $O(|V|^2|E|)$ but is significantly faster in practice. In addition, we develop a linear-time algorithm that provides a factor-2 approximation to the optimal locally-dense decomposition. Furthermore, we show that the $k$-core decomposition is also a factor-2 approximation, however, as demonstrated by our experimental evaluation, in practice $k$-cores have different structure than locally-dense subgraphs, and as predicted by the theory, $k$-cores are not always well-aligned with graph density.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2019-07-11
Yassine Hamoudi; Patrick Rebentrost; Ansis Rosmanis; Miklos Santha

Submodular functions are set functions mapping every subset of some ground set of size $n$ into the real numbers and satisfying the diminishing returns property. Submodular minimization is an important field in discrete optimization theory due to its relevance for various branches of mathematics, computer science and economics. The currently fastest strongly polynomial algorithm for exact minimization [LSW15] runs in time $\widetilde{O}(n^3 \cdot \mathrm{EO} + n^4)$ where $\mathrm{EO}$ denotes the cost to evaluate the function on any set. For functions with range $[-1,1]$, the best $\epsilon$-additive approximation algorithm [CLSW17] runs in time $\widetilde{O}(n^{5/3}/\epsilon^{2} \cdot \mathrm{EO})$. In this paper we present a classical and a quantum algorithm for approximate submodular minimization. Our classical result improves on the algorithm of [CLSW17] and runs in time $\widetilde{O}(n^{3/2}/\epsilon^2 \cdot \mathrm{EO})$. Our quantum algorithm is, up to our knowledge, the first attempt to use quantum computing for submodular optimization. The algorithm runs in time $\widetilde{O}(n^{5/4}/\epsilon^{5/2} \cdot \log(1/\epsilon) \cdot \mathrm{EO})$. The main ingredient of the quantum result is a new method for sampling with high probability $T$ independent elements from any discrete probability distribution of support size $n$ in time $O(\sqrt{Tn})$. Previous quantum algorithms for this problem were of complexity $O(T\sqrt{n})$.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2019-09-26
Philip Bille; Inge Li Gørtz; Teresa Anna Steiner

Given a string $S$ of length $n$, the classic string indexing problem is to preprocess $S$ into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2019-10-23

Graph Neural Networks (GNNs) are a powerful representational tool for solving problems on graph-structured inputs. In almost all cases so far, however, they have been applied to directly recovering a final solution from raw inputs, without explicit guidance on how to structure their problem-solving. Here, instead, we focus on learning in the space of algorithms: we train several state-of-the-art GNN architectures to imitate individual steps of classical graph algorithms, parallel (breadth-first search, Bellman-Ford) as well as sequential (Prim's algorithm). As graph algorithms usually rely on making discrete decisions within neighbourhoods, we hypothesise that maximisation-based message passing neural networks are best-suited for such objectives, and validate this claim empirically. We also demonstrate how learning in the space of algorithms can yield new opportunities for positive transfer between tasks---showing how learning a shortest-path algorithm can be substantially improved when simultaneously learning a reachability algorithm.

更新日期：2020-01-16
• arXiv.cs.DS Pub Date : 2020-01-13
William B. Langdon

random_tree() is a linear time and space C++ implementation able to create trees of up to a billion nodes for genetic programming and genetic improvement experiments. A 3.60GHz CPU can generate more than 18 million random nodes for GP program trees per second.

更新日期：2020-01-15
• arXiv.cs.DS Pub Date : 2020-01-13
Subhrangsu Mandal; Anisur Rahaman Molla; William K. Moses Jr

The graph exploration problem requires a group of mobile robots, initially placed arbitrarily on the nodes of a graph, to work collaboratively to explore the graph such that each node is eventually visited by at least one robot. One important requirement of exploration is the {\em termination} condition, i.e., the robots must know that exploration is completed. The problem of live exploration of a dynamic ring using mobile robots was recently introduced in [Di Luna et al., ICDCS 2016]. In it, they proposed multiple algorithms to solve exploration in fully synchronous and semi-synchronous settings with various guarantees when $2$ robots were involved. They also provided guarantees that with certain assumptions, exploration of the ring using two robots was impossible. An important question left open was how the presence of $3$ robots would affect the results. In this paper, we try to settle this question in a fully synchronous setting and also show how to extend our results to a semi-synchronous setting. In particular, we present algorithms for exploration with explicit termination using $3$ robots in conjunction with either (i) unique IDs of the robots and edge crossing detection capability (i.e., two robots moving in opposite directions through an edge in the same round can detect each other), or (ii) access to randomness. The time complexity of our deterministic algorithm is asymptotically optimal. We also provide complementary impossibility results showing that there does not exist any explicit termination algorithm for $2$ robots. The theoretical analysis and comprehensive simulations of our algorithm show the effectiveness and efficiency of the algorithm in dynamic rings. We also present an algorithm to achieve exploration with partial termination using $3$ robots in the semi-synchronous setting.

更新日期：2020-01-15
• arXiv.cs.DS Pub Date : 2020-01-13
Feras A. Saad; Cameron E. Freer; Martin C. Rinard; Vikash K. Mansinghka

This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the probabilities of the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm must be specified using at most $k$ bits of precision? We present a theoretical framework for formulating this problem and provide new techniques for finding sampling algorithms that are optimal both statistically (in the sense of sampling accuracy) and information-theoretically (in the sense of entropy consumption). We leverage these results to build a system that, for a broad family of measures of statistical accuracy, delivers a sampling algorithm whose expected entropy usage is minimal among those that induce the same distribution (i.e., is "entropy-optimal") and whose output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ is a closest approximation to the target distribution $(p_1, \dots, p_n)$ among all entropy-optimal sampling algorithms that operate within the specified $k$-bit precision. This optimal approximate sampler is also a closer approximation than any (possibly entropy-suboptimal) sampler that consumes a bounded amount of entropy with the specified precision, a class which includes floating-point implementations of inversion sampling and related methods found in many software libraries. We evaluate the accuracy, entropy consumption, precision requirements, and wall-clock runtime of our optimal approximate sampling algorithms on a broad set of distributions, demonstrating the ways that they are superior to existing approximate samplers and establishing that they often consume significantly fewer resources than are needed by exact samplers.

更新日期：2020-01-15
• arXiv.cs.DS Pub Date : 2020-01-14
Debasis Dwibedy; Rakesh Mohanty

Online scheduling has been a well studied and challenging research problem over the last five decades since the pioneering work of Graham with immense practical significance in various applications such as interactive parallel processing, routing in communication networks, distributed data management, client-server communications, traffic management in transportation, industrial manufacturing and production. In this problem, a sequence of jobs is received one by one in order by the scheduler for scheduling over a number of machines. On arrival of a job, the scheduler assigns the job irrevocably to a machine before the availability of the next job with an objective to minimize the completion time of the scheduled jobs. This paper highlights the state of the art contributions for online scheduling of a sequence of independent jobs on identical and uniform related machines with a special focus on preemptive and non-preemptive processing formats by considering makespan minimization as the optimality criterion. We present the fundamental aspects of online scheduling from a beginner's perspective along with a background of general scheduling framework. Important competitive analysis results obtained by well-known deterministic and randomized online scheduling algorithms in the literature are presented along with research challenges and open problems. Two of the emerging recent trends such as resource augmentation and semi-online scheduling are discussed as a motivation for future research work.

更新日期：2020-01-15
• arXiv.cs.DS Pub Date : 2020-01-14
Stefan Böttcher; Rita Hartel; Sven Peeters

Like [1], we present an algorithm to compute the simulation of a query pattern in a graph of labeled nodes and unlabeled edges. However, our algorithm works on a compressed graph grammar, instead of on the original graph. The speed-up of our algorithm compared to the algorithm in [1] grows with the size of the graph and with the compression strength.

更新日期：2020-01-15
• arXiv.cs.DS Pub Date : 2020-01-14
Stefan Kratsch; Florian Nelles

Computing all-pairs shortest paths is a fundamental and much-studied problem with many applications. Unfortunately, despite intense study, there are still no significantly faster algorithms for it than the $\mathcal{O}(n^3)$ time algorithm due to Floyd and Warshall (1962). Somewhat faster algorithms exist for the vertex-weighted version if fast matrix multiplication may be used. Yuster (SODA 2009) gave an algorithm running in time $\mathcal{O}(n^{2.842})$, but no combinatorial, truly subcubic algorithm is known. Motivated by the recent framework of efficient parameterized algorithms (or "FPT in P"), we investigate the influence of the graph parameters clique-width ($cw$) and modular-width ($mw$) on the running times of algorithms for solving All-Pairs Shortest Paths. We obtain efficient (and combinatorial) parameterized algorithms on non-negative vertex-weighted graphs of times $\mathcal{O}(cw^2n^2)$, resp. $\mathcal{O}(mw^2n + n^2)$. If fast matrix multiplication is allowed then the latter can be improved to $\mathcal{O}(mw^{1.842}n + n^2)$ using the algorithm of Yuster as a black box. The algorithm relative to modular-width is adaptive, meaning that the running time matches the best unparameterized algorithm for parameter value $mw$ equal to $n$, and they outperform them already for $mw \in \mathcal{O}(n^{1 - \varepsilon})$ for any $\varepsilon > 0$.

更新日期：2020-01-15
• arXiv.cs.DS Pub Date : 2019-05-09
Hans L. Bodlaender; Lars Jaffke; Jan Arne Telle

In this work, we give a structural lemma on merges of typical sequences, a notion that was introduced in 1991 [Lagergren and Arnborg, Bodlaender and Kloks, both ICALP 1991] to obtain constructive linear time parameterized algorithms for treewidth and pathwidth. The lemma addresses a runtime bottleneck in those algorithms but so far it does not lead to asymptotically faster algorithms. However, we apply the lemma to show that the cutwidth and the modified cutwidth of series parallel digraphs can be computed in $\mathcal{O}(n^2)$ time.

更新日期：2020-01-15
• arXiv.cs.DS Pub Date : 2019-10-08
Ananda Theertha Suresh

For a dataset of label-count pairs, an anonymized histogram is the multiset of counts. Anonymized histograms appear in various potentially sensitive contexts such as password-frequency lists, degree distribution in social networks, and estimation of symmetric properties of discrete distributions. Motivated by these applications, we propose the first differentially private mechanism to release anonymized histograms that achieves near-optimal privacy utility trade-off both in terms of number of items and the privacy parameter. Further, if the underlying histogram is given in a compact format, the proposed algorithm runs in time sub-linear in the number of items. For anonymized histograms generated from unknown discrete distributions, we show that the released histogram can be directly used for estimating symmetric properties of the underlying distribution.

更新日期：2020-01-15
• arXiv.cs.DS Pub Date : 2019-10-23
I. Vinod Reddy

In this paper, we study several coloring problems on graphs from the viewpoint of parameterized complexity. We show that Precoloring Extension and Equitable Coloring problems are fixed-parameter tractable (FPT) parameterized by the distance to threshold graphs. We also study the List k-Coloring and show that the problem is NP-complete on split graphs and it is FPT parameterized by solution size on split graphs.

更新日期：2020-01-15
• arXiv.cs.DS Pub Date : 2019-12-13
Luis Ignacio Lopera GonzálezFriedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany; Adrian DerungsFriedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany; Oliver AmftFriedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany

In this paper, we introduce the increasing belief criterion in association rule mining. The criterion uses a recursive application of Bayes' theorem to compute a rule's belief. Extracted rules are required to have their belief increase with their last observation. We extend the taxonomy of association rule mining algorithms with a new branch for Bayesian rule mining~(BRM), which uses increasing belief as the rule selection criterion. In contrast, the well-established frequent association rule mining~(FRM) branch relies on the minimum-support concept to extract rules. We derive properties of the increasing belief criterion, such as the increasing belief boundary, no-prior-worries, and conjunctive premises. Subsequently, we implement a BRM algorithm using the increasing belief criterion, and illustrate its functionality in three experiments: (1)~a proof-of-concept to illustrate BRM properties, (2)~an analysis relating socioeconomic information and chemical exposure data, and (3)~mining behaviour routines in patients undergoing neurological rehabilitation. We illustrate how BRM is capable of extracting rare rules and does not suffer from support dilution. Furthermore, we show that BRM focuses on the individual event generating processes, while FRM focuses on their commonalities. We consider BRM's increasing belief as an alternative criterion to thresholds on rule support, as often applied in FRM, to determine rule usefulness.

更新日期：2020-01-15
• arXiv.cs.DS Pub Date : 2020-01-11
Jean Claude Bajard; Jérémy Marrez; Thomas Plantard; Pascal Véron

Polynomial Modular Number System (PMNS) is a convenient number system for modular arithmetic, introduced in 2004. The main motivation was to accelerate arithmetic modulo an integer $p$. An existence theorem of PMNS with specific properties was given. The construction of such systems relies on sparse polynomials whose roots modulo $p$ can be chosen as radices of this kind of positional representation. However, the choice of those polynomials and the research of their roots are not trivial. In this paper, we introduce a general theorem on the existence of PMNS and we provide bounds on the size of the digits used to represent an integer modulo $p$. Then, we present classes of suitable polynomials to obtain systems with an efficient arithmetic. Finally, given a prime $p$, we evaluate the number of roots of polynomials modulo $p$ in order to give a number of PMNS bases we can reach. Hence, for a fixed prime $p$, it is possible to get numerous PMNS, which can be used efficiently for different applications based on large prime finite fields, such as those we find in cryptography, like RSA, Diffie-Hellmann key exchange and ECC (Elliptic Curve Cryptography).

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2020-01-11
Viet Vo; Shangqi Lai; Xingliang Yuan; Shi-Feng Sun; Surya Nepal; Joseph K. Liu

Searchable encryption (SE) is one of the key enablers for building encrypted databases. It allows a cloud server to search over encrypted data without decryption. Dynamic SE additionally includes data addition and deletion operations to enrich the functions of encrypted databases. Recent attacks exploiting the leakage in dynamic operations drive rapid development of new SE schemes revealing less information while performing updates; they are also known as forward and backward private SE. Newly added data is no longer linkable to queries issued before, and deleted data is no longer searchable in queries issued later. However, those advanced SE schemes reduce the efficiency of SE, especially in the communication cost between the client and server. In this paper, we resort to the hardware-assisted solution, aka Intel SGX, to ease the above bottleneck. Our key idea is to leverage SGX to take over the most tasks of the client, i.e., tracking keyword states along with data addition and caching deleted data. However, handling large datasets is non-trivial due to the I/O and memory constraints of the SGX enclave. We further develop batch data processing and state compression technique to reduce the communication overhead between the SGX and untrusted server, and minimise the memory footprint in the enclave. We conduct a comprehensive set of evaluations on both synthetic and real-world datasets, which confirm that our designs outperform the prior art.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2020-01-11
Raed Jaberi

Given a $2$-vertex-twinless connected directed graph $G=(V,E)$, the minimum $2$-vertex-twinless connected spanning subgraph problem is to find a minimum cardinality edge subset $E^{t} \subseteq E$ such that the subgraph $(V,E^{t})$ is $2$-vertex-twinless connected. Let $G^{1}$ be a minimal $2$-vertex-connected subgraph of $G$. In this paper we present a $(2+a_{t}/2)$-approximation algorithm for the minimum $2$-vertex-twinless connected spanning subgraph problem, where $a_{t}$ is the number of twinless articulation points in $G^{1}$.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2020-01-11
Pierre Aboulker; Édouard Bonnet; Eun Jung Kim; Florian Sikora

The first-fit coloring is a heuristic that assigns to each vertex, arriving in a specified order $\sigma$, the smallest available color. The problem Grundy Coloring asks how many colors are needed for the most adversarial vertex ordering $\sigma$, i.e., the maximum number of colors that the first-fit coloring requires over all possible vertex orderings. Since its inception by Grundy in 1939, Grundy Coloring has been examined for its structural and algorithmic aspects. A brute-force $f(k)n^{2^{k-1}}$-time algorithm for Grundy Coloring on general graphs is not difficult to obtain, where $k$ is the number of colors required by the most adversarial vertex ordering. It was asked several times whether the dependency on $k$ in the exponent of $n$ can be avoided or reduced, and its answer seemed elusive until now. We prove that Grundy Coloring is W[1]-hard and the brute-force algorithm is essentially optimal under the Exponential Time Hypothesis, thus settling this question by the negative. The key ingredient in our W[1]-hardness proof is to use so-called half-graphs as a building block to transmit a color from one vertex to another. Leveraging the half-graphs, we also prove that b-Chromatic Core is W[1]-hard, whose parameterized complexity was posed as an open question by Panolan et al. [JCSS '17]. A natural follow-up question is, how the parameterized complexity changes in the absence of (large) half-graphs. We establish fixed-parameter tractability on $K_{t,t}$-free graphs for b-Chromatic Core and Partial Grundy Coloring, making a step toward answering this question. The key combinatorial lemma underlying the tractability result might be of independent interest.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2019-12-22
Liang Huang; He Zhang; Dezhong Deng; Kai Zhao; Kaibo Liu; David A. Hendrix; David H. Mathews

Motivation: Predicting the secondary structure of an RNA sequence is useful in many applications. Existing algorithms (based on dynamic programming) suffer from a major limitation: their runtimes scale cubically with the RNA length, and this slowness limits their use in genome-wide applications. Results: We present a novel alternative $O(n^3)$-time dynamic programming algorithm for RNA folding that is amenable to heuristics that make it run in $O(n)$ time and $O(n)$ space, while producing a high-quality approximation to the optimal solution. Inspired by incremental parsing for context-free grammars in computational linguistics, our alternative dynamic programming algorithm scans the sequence in a left-to-right (5'-to-3') direction rather than in a bottom-up fashion, which allows us to employ the effective beam pruning heuristic. Our work, though inexact, is the first RNA folding algorithm to achieve linear runtime (and linear space) without imposing constraints on the output structure. Surprisingly, our approximate search results in even higher overall accuracy on a diverse database of sequences with known structures. More interestingly, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accuracies for long-range base pairs (500+ nucleotides apart), both of which are well known to be challenging for the current models. Availability: Our source code is available at https://github.com/LinearFold/LinearFold, and our webserver is at http://linearfold.org (sequence limit: 100,000nt).

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2020-01-13
Johannes K. Fichte; Markus Hecher; Patrick Thier; Stefan Woltran

Bounded treewidth is one of the most cited combinatorial invariants, which was applied in the literature for solving several counting problems efficiently. A canonical counting problem is #SAT, which asks to count the satisfying assignments of a Boolean formula. Recent work shows that benchmarking instances for #SAT often have reasonably small treewidth. This paper deals with counting problems for instances of small treewidth. We introduce a general framework to solve counting questions based on state-of-the-art database management systems (DBMS). Our framework takes explicitly advantage of small treewidth by solving instances using dynamic programming (DP) on tree decompositions (TD). Therefore, we implement the concept of DP into a DBMS (PostgreSQL), since DP algorithms are already often given in terms of table manipulations in theory. This allows for elegant specifications of DP algorithms and the use of SQL to manipulate records and tables, which gives us a natural approach to bring DP algorithms into practice. To the best of our knowledge, we present the first approach to employ a DBMS for algorithms on TDs. A key advantage of our approach is that DBMS naturally allow to deal with huge tables with a limited amount of main memory (RAM), parallelization, as well as suspending computation.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2020-01-13
Markus Hecher; Michael Morak; Stefan Woltran

Epistemic logic programs (ELPs) are a popular generalization of standard Answer Set Programming (ASP) providing means for reasoning over answer sets within the language. This richer formalism comes at the price of higher computational complexity reaching up to the fourth level of the polynomial hierarchy. However, in contrast to standard ASP, dedicated investigations towards tractability have not been undertaken yet. In this paper, we give first results in this direction and show that central ELP problems can be solved in linear time for ELPs exhibiting structural properties in terms of bounded treewidth. We also provide a full dynamic programming algorithm that adheres to these bounds. Finally, we show that applying treewidth to a novel dependency structure---given in terms of epistemic literals---allows to bound the number of ASP solver calls in typical ELP solving procedures.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2020-01-13
Marcin Jurdziński; Rémi Morvan

An attractor decomposition meta-algorithm for solving parity games is given that generalizes the classic McNaughton-Zielonka algorithm and its recent quasi-polynomial variants due to Parys (2019), and to Lehtinen, Schewe, and Wojtczak (2019). The central concepts studied and exploited are attractor decompositions of dominia in parity games and the ordered trees that describe the inductive structure of attractor decompositions. The main technical results include the embeddable decomposition theorem and the dominion separation theorem that together help establish a precise structural condition for the correctness of the universal algorithm: it suffices that the two ordered trees given to the algorithm as inputs embed the trees of some attractor decompositions of the largest dominia for each of the two players, respectively. The universal algorithm yields McNaughton-Zielonka, Parys's, and Lehtinen-Schewe-Wojtczak algorithms as special cases when suitable universal trees are given to it as inputs. The main technical results provide a unified proof of correctness and deep structural insights into those algorithms. A symbolic implementation of the universal algorithm is also given that improves the symbolic space complexity of solving parity games in quasi-polynomial time from $O(d \lg n)$---achieved by Chatterjee, Dvo\v{r}\'{a}k, Henzinger, and Svozil (2018)---down to $O(\lg d)$, where $n$ is the number of vertices and $d$ is the number of distinct priorities in a parity game. This not only exponentially improves the dependence on $d$, but it also entirely removes the dependence on $n$.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2020-01-13
Zeyuan Allen-Zhu; Yuanzhi Li

How does a 110-layer ResNet learn a high-complexity classifier using relatively few training examples and short training time? We present a theory towards explaining this in terms of $\textit{hierarchical learning}$. We refer hierarchical learning as the learner learns to represent a complicated target function by decomposing it into a sequence of simpler functions to reduce sample and time complexity. This paper formally analyzes how multi-layer neural networks can perform such hierarchical learning efficiently and automatically simply by applying stochastic gradient descent (SGD). On the conceptual side, we present, to the best of our knowledge, the FIRST theory result indicating how very deep neural networks can still be sample and time efficient on certain hierarchical learning tasks, when NO KNOWN non-hierarchical algorithms (such as kernel method, linear regression over feature mappings, tensor decomposition, sparse coding) are efficient. We establish a new principle called "backward feature correction", which we believe is the key to understand the hierarchical learning in multi-layer neural networks. On the technical side, we show for regression and even for binary classification, for every input dimension $d > 0$, there is a concept class consisting of degree $\omega(1)$ multi-variate polynomials so that, using $\omega(1)$-layer neural networks as learners, SGD can learn any target function from this class in $\mathsf{poly}(d)$ time using $\mathsf{poly}(d)$ samples to any $\frac{1}{\mathsf{poly}(d)}$ error, through learning to represent it as a composition of $\omega(1)$ layers of quadratic functions. In contrast, we present lower bounds stating that several non-hierarchical learners, including any kernel methods, neural tangent kernels, must suffer from $d^{\omega(1)}$ sample or time complexity to learn functions in this concept class even to any $d^{-0.01}$ error.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2020-01-13
Arnold Filtser

A partition $\mathcal{P}$ of a weighted graph $G$ is $(\sigma,\tau,\Delta)$-sparse if every cluster has diameter at most $\Delta$, and every ball of radius $\Delta/\sigma$ intersects at most $\tau$ clusters. Similarly, $\mathcal{P}$ is $(\sigma,\tau,\Delta)$-scattering if instead for balls we require that every shortest path of length at most $\Delta/\sigma$ intersects at most $\tau$ clusters. Given a graph $G$ that admits a $(\sigma,\tau,\Delta)$-sparse partition for all $\Delta>0$, Jia et al. [STOC05] constructed a solution for the Universal Steiner Tree problem (and also Universal TSP) with stretch $O(\tau\sigma^2\log_\tau n)$. Given a graph $G$ that admits a $(\sigma,\tau,\Delta)$-scattering partition for all $\Delta>0$, we construct a solution for the Steiner Point Removal problem with stretch $O(\tau^3\sigma^3)$. We then construct sparse and scattering partitions for various different graph families, receiving many new results for the Universal Steiner Tree and Steiner Point Removal problems.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2018-05-07
Merav Parter

In this paper, we present improved algorithms for the $(\Delta+1)$ (vertex) coloring problem in the Congested-Clique model of distributed computing. In this model, the input is a graph on $n$ nodes, initially each node knows only its incident edges, and per round each two nodes can exchange $O(\log n)$ bits of information. Our key result is a randomized $(\Delta+1)$ vertex coloring algorithm that works in $O(\log\log \Delta \cdot \log^* \Delta)$-rounds. This is achieved by combining the recent breakthrough result of [Chang-Li-Pettie, STOC'18] in the \local\ model and a degree reduction technique. We also get the following results with high probability: (1) $(\Delta+1)$-coloring for $\Delta=O((n/\log n)^{1-\epsilon})$ for any $\epsilon \in (0,1)$, within $O(\log(1/\epsilon)\log^* \Delta)$ rounds, and (2) $(\Delta+\Delta^{1/2+o(1)})$-coloring within $O(\log^* \Delta)$ rounds. Turning to deterministic algorithms, we show a $(\Delta+1)$-coloring algorithm that works in $O(\log \Delta)$ rounds.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2019-07-03
Panagiotis Charalampopoulos; Tomasz Kociumaka; Solon P. Pissis; Jakub Radoszewski; Wojciech Rytter; Juliusz Straszyński; Tomasz Waleń; Wiktor Zuba

The $k$-mismatch problem consists in computing the Hamming distance between a pattern $P$ of length $m$ and every length-$m$ substring of a text $T$ of length $n$, if this distance is no more than $k$. In many real-world applications, any cyclic rotation of $P$ is a relevant pattern, and thus one is interested in computing the minimal distance of every length-$m$ substring of $T$ and any cyclic rotation of $P$. This is the circular pattern matching with $k$ mismatches ($k$-CPM) problem. A multitude of papers have been devoted to solving this problem but, to the best of our knowledge, only average-case upper bounds are known. In this paper, we present the first non-trivial worst-case upper bounds for the $k$-CPM problem. Specifically, we show an $O(nk)$-time algorithm and an $O(n+\frac{n}{m}\,k^4)$-time algorithm. The latter algorithm applies in an extended way a technique that was very recently developed for the $k$-mismatch problem [Bringmann et al., SODA 2019]. A preliminary version of this work appeared at FCT 2019. In this version we improve the time complexity of the main algorithm from $O(n+\frac{n}{m}\,k^5)$ to $O(n+\frac{n}{m}\,k^4)$.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2019-09-24
Rasmus Pagh; Johan Sivertsen

Motivated by the problem of filtering candidate pairs in inner product similarity joins we study the following inner product estimation problem: Given parameters $d\in {\bf N}$, $\alpha>\beta\geq 0$ and unit vectors $x,y\in {\bf R}^{d}$ consider the task of distinguishing between the cases $\langle x, y\rangle\leq\beta$ and $\langle x, y\rangle\geq \alpha$ where $\langle x, y\rangle = \sum_{i=1}^d x_i y_i$ is the inner product of vectors $x$ and $y$. The goal is to distinguish these cases based on information on each vector encoded independently in a bit string of the shortest length possible. In contrast to much work on compressing vectors using randomized dimensionality reduction, we seek to solve the problem deterministically, with no probability of error. Inner product estimation can be solved in general via estimating $\langle x, y\rangle$ with an additive error bounded by $\varepsilon = \alpha - \beta$. We show that $d \log_2 \left(\tfrac{\sqrt{1-\beta}}{\varepsilon}\right) \pm \Theta(d)$ bits of information about each vector is necessary and sufficient. Our upper bound is constructive and improves a known upper bound of $d \log_2(1/\varepsilon) + O(d)$ by up to a factor of 2 when $\beta$ is close to $1$. The lower bound holds even in a stronger model where one of the vectors is known exactly, and an arbitrary estimation function is allowed.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2019-09-25
Zhuolun Xiang; Bolin Ding; Xi He; Jingren Zhou

Local differential privacy (LDP) enables private data sharing and analytics without the need for a trusted data collector. Error-optimal primitives (for, e.g., estimating means and item frequencies) under LDP have been well studied. For analytical tasks such as range queries, however, the best known error bound is dependent on the domain size of private data, which is potentially prohibitive. This deficiency is inherent as LDP protects the same level of indistinguishability between any pair of private data values for each data downer. In this paper, we utilize an extension of $\eps$-LDP called Metric-LDP or $E$-LDP, where a metric $E$ defines heterogeneous privacy guarantees for different pairs of private data values and thus provides a more flexible knob than $\eps$ does to relax LDP and tune utility-privacy trade-offs. We show that, under such privacy relaxations, for analytical workloads such as linear counting, multi-dimensional range counting queries, and quantile queries, we can achieve significant gains in utility. In particular, for range queries under $E$-LDP where the metric $E$ is the $\lone$-distance function scaled by $\eps$, we design mechanisms with errors independent on the domain sizes; instead, their errors depend on the metric $E$, which specifies in what granularity the private data is protected. We believe that the primitives we design for $E$-LDP will be useful in developing mechanisms for other analytical tasks, and encourage the adoption of LDP in practice.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2019-10-04
Antoine Amarilli; İsmail İlkan Ceylan

We study the problem of probabilistic query evaluation (PQE) over probabilistic graphs, namely, tuple-independent probabilistic databases (TIDs) on signatures of arity two. Our focus is the class of queries that is closed under homomorphisms, or equivalently, the infinite unions of conjunctive queries, denoted UCQ^\infty. Our main result states that all unbounded queries in UCQ^\infty are #P-hard for PQE. As bounded queries in UCQ^\infty are already classified by the dichotomy of Dalvi and Suciu [17], our results and theirs imply a complete dichotomy on PQE for UCQ^\infty queries over probabilistic graphs. This dichotomy covers in particular all fragments in UCQ^\infty such as negation-free (disjunctive) Datalog, regular path queries, and a large class of ontology-mediated queries on arity-two signatures. Our result is shown by reducing from counting the valuations of positive partitioned 2-DNF formulae (#PP2DNF) for some queries, or from the source-to-target reliability problem in an undirected graph (#U-ST-CON) for other queries, depending on properties of minimal models.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2019-10-29
David Adjiashvili; Felix Hommelsheim; Moritz Mühlenthaler

Graph connectivity and network design problems are among the most fundamental problems in combinatorial optimization. The minimum spanning tree problem, the two edge-connected spanning subgraph problem (2-ECSS) and the tree augmentation problem (TAP) are all examples of fundamental well-studied network design tasks that postulate different initial states of the network and different assumptions on the reliability of network components. In this paper we motivate and study \emph{Flexible Graph Connectivity} (FGC), a problem that mixes together both the modeling power and the complexities of all aforementioned problems and more. In a nutshell, FGC asks to design a connected network, while allowing to specify different reliability levels for individual edges. While this non-uniform nature of the problem makes it appealing from the modeling perspective, it also renders most existing algorithmic tools for dealing with network design problems unfit for approximating FGC. In this paper we develop a general algorithmic approach for approximating FGC that yields approximation algorithms with ratios that are very close to the best known bounds for many special cases, such as 2-ECSS and TAP. Our algorithm and analysis combine various techniques including a weight-scaling algorithm, a charging argument that uses a variant of exchange bijections between spanning trees and a factor revealing min-max-min optimization problem.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2019-11-24
E. G. Kondakova; A. Ya. Kanel-Belov

In this paper, some open questions that are posed in Ajans' dissertation continue to be addressed: a robot bypass with a generator of random bits of integer spaces in the presence of a stone and a subspace of flags. This work is devoted to bypassing the maze with a finite state machine with a random bit generator. This task is part of the rapidly evolving theme of bypassing the maze by various finite state machines. or their teams, which is closely related to problems from the theory of computational complexity and probability theory. In this paper, it is shown at what dimensions a robot with a random bit generator and a stone can bypass integer space with a subspace of flags. In this paper, we will study the behavior of a finite state machine with a random bit generator on integer spaces. In particular, it was proved that the robot bypasses $\ zs ^ 2$ and cannot bypass $\ zs ^ 3$; a robot with a stone bypasses $\ zs ^ 4$ and cannot bypass $\ zs ^ 5$; a robot with a stone and a flag bypasses $\ zs ^ 6$ and cannot bypass $\ zs ^ 7$; a robot with a stone and a plane of flags bypasses $\ zs ^ 8$ and cannot bypass $\ zs ^ 9$.

更新日期：2020-01-14
• arXiv.cs.DS Pub Date : 2020-01-09
Baichuan Mo; Zhenliang Ma; Haris N. Koutsopoulosc; Jinhua Zhao

Urban rail services are the principal means of public transportation in many cities. To understand the crowding patterns and develop efficient operation strategies in the system, obtaining path choices is important. This paper proposed an assignment-based path choice estimation framework using automated fare collection (AFC) data. The framework captures the inherent correlation of crowding among stations, as well as the interaction between path choice and left behind. The path choice estimation is formulated as an optimization problem. The original problem is intractable because of a non-analytical constraint and a non-linear equation constraint. A solution procedure is proposed to decompose the original problem into three tractable sub-problems, which can be solved efficiently. The model is validated using both synthetic data and real-world AFC data in Hong Kong Mass Transit Railway (MTR) system. The synthetic data test validates the model's effectiveness in estimating path choice parameters, which can outperform the purely simulation-based optimization methods in both accuracy and efficiency. The test results using actual data show that the estimated path shares are more reasonable than survey-derived path shares and uniform path shares. Model robustness in terms of different initial values and different case study dates are also verified.

更新日期：2020-01-13
• arXiv.cs.DS Pub Date : 2015-11-10
Luís M. S. Russo

We study the dynamic optimality conjecture, which predicts that splay trees are a form of universally efficient binary search tree, for any access sequence. We reduce this claim to a regular access bound, which seems plausible and might be easier to prove. This approach may be useful to establish dynamic optimality.

更新日期：2020-01-13
• arXiv.cs.DS Pub Date : 2017-09-23
Jonathan Jedwab; Tara Petrie; Samuel Simon

An RNA secondary structure is designable if there is an RNA sequence which can attain its maximum number of base pairs only by adopting that structure. The combinatorial RNA design problem, introduced by Hale\v{s} et al. in 2016, is to determine whether or not a given RNA secondary structure is designable. Hale\v{s} et al. identified certain classes of designable and non-designable secondary structures by reference to their corresponding rooted trees. We introduce an infinite class of rooted trees containing unpaired nucleotides at the greatest height, and prove constructively that their corresponding secondary structures are designable. This complements previous results for the combinatorial RNA design problem.

更新日期：2020-01-13
• arXiv.cs.DS Pub Date : 2018-01-21
Luís M. S. Russo; Andreia Sofia Teixeira; Alexandre P Francisco

We consider the problem of uniformly generating a spanning tree, of a connected undirected graph. This process is useful to compute statistics, namely for phylogenetic trees. We describe a Markov chain for producing these trees. For cycle graphs we prove that this approach significantly outperforms existing algorithms. For general graphs we obtain no analytical bounds, but experimental results show that the chain still converges quickly. This yields an efficient algorithm, also due to the use of proper fast data structures. To bound the mixing time of the chain we describe a coupling, which we analyse for cycle graphs and simulate for other graphs.

更新日期：2020-01-13
• arXiv.cs.DS Pub Date : 2018-02-15
Peter Chini; Roland Meyer; Prakash Saivasan

We study the fine-grained complexity of Leader Contributor Reachability (LCR) and Bounded-Stage Reachability (BSR), two variants of the safety verification problem for shared memory concurrent programs. For both problems, the memory is a single variable over a finite data domain. Our contributions are new verification algorithms and lower bounds. The latter are based on the Exponential Time Hypothesis (ETH), the problem Set Cover, and cross-compositions. LCR is the question whether a designated leader thread can reach an unsafe state when interacting with a certain number of equal contributor threads. We suggest two parameterizations: (1) By the size of the data domain D and the size of the leader L, and (2) by the size of the contributors C. We present algorithms for both cases. The key techniques are compact witnesses and dynamic programming. The algorithms run in O*((L(D+1))^(LD) * D^D) and O*(2^C) time, showing that both parameterizations are fixed-parameter tractable. We complement the upper bounds by (matching) lower bounds based on ETH and Set Cover. Moreover, we prove the absence of polynomial kernels. For BSR, we consider programs involving t different threads. We restrict the analysis to computations where the write permission changes s times between the threads. BSR asks whether a given configuration is reachable via such an s-stage computation. When parameterized by P, the maximum size of a thread, and t, the interesting observation is that the problem has a large number of difficult instances. Formally, we show that there is no polynomial kernel, no compression algorithm that reduces the size of the data domain D or the number of stages s to a polynomial dependence on P and t. This indicates that symbolic methods may be harder to find for this problem.

更新日期：2020-01-13
• arXiv.cs.DS Pub Date : 2019-02-18
P. A. M. Casares; M. A. Martin-Delgado

We introduce a new quantum optimization algorithm for dense Linear Programming problems, which can be seen as the quantization of the Interior Point Predictor-Corrector algorithm \cite{Predictor-Corrector} using a Quantum Linear System Algorithm \cite{DenseHHL}. The (worst case) work complexity of our method is, up to polylogarithmic factors, $O(L\sqrt{n}(n+m)\overline{||M||_F}\bar{\kappa}^2\epsilon^{-2})$ for $n$ the number of variables in the cost function, $m$ the number of constraints, $\epsilon^{-1}$ the target precision, $L$ the bit length of the input data, $\overline{||M||_F}$ an upper bound to the Frobenius norm of the linear systems of equations that appear, $||M||_F$, and $\bar{\kappa}$ an upper bound to the condition number $\kappa$ of those systems of equations. This represents a quantum speed-up in the number $n$ of variables in the cost function with respect to the comparable classical Interior Point algorithms when the initial matrix of the problem $A$ is dense: if we substitute the quantum part of the algorithm by classical algorithms such as Conjugate Gradient Descent, that would mean the whole algorithm has complexity $O(L\sqrt{n}(n+m)^2\bar{\kappa} \log(\epsilon^{-1}))$, or with exact methods, at least $O(L\sqrt{n}(n+m)^{2.373})$. Also, in contrast with any Quantum Linear System Algorithm, the algorithm described in this article outputs a classical description of the solution vector, and the value of the optimal solution.

更新日期：2020-01-13
• arXiv.cs.DS Pub Date : 2019-07-16
Deeksha Adil; Richard Peng; Sushant Sachdeva

Linear regression in $\ell_p$-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving $\ell_p$-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p > 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that is guaranteed to converge rapidly for p > 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any $p \in [2,\infty).$ Our algorithm is simple to implement and is guaranteed to find a $(1+\varepsilon)$-approximate solution in $O(p^{3.5} m^{\frac{p-2}{2(p-1)}} \log \frac{m}{\varepsilon}) \le O_p(\sqrt{m} \log \frac{m}{\varepsilon} )$ iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10--50x, and is the fastest among available implementations in the high-accuracy regime.

更新日期：2020-01-13
• arXiv.cs.DS Pub Date : 2019-08-17
Shinsaku Sakaue

Submodular maximization with a cardinality constraint can model various problems, and those problems are often very large in practice. For the case where objective functions are monotone, many fast approximation algorithms have been developed. The stochastic greedy algorithm (SG) is one such algorithm, which is widely used thanks to its simplicity, efficiency, and high empirical performance. However, its approximation guarantee has been proved only for monotone objective functions. When it comes to non-monotone objective functions, existing approximation algorithms are inefficient relative to the fast algorithms developed for the case of monotone objectives. In this paper, we prove that SG (with slight modification) can achieve almost $1/4$-approximation guarantees in expectation in linear time even if objective functions are non-monotone. Our result provides a constant-factor approximation algorithm with the fewest oracle queries for non-monotone submodular maximization with a cardinality constraint. Experiments validate the performance of (modified) SG.

更新日期：2020-01-13
• arXiv.cs.DS Pub Date : 2019-10-31
Uwe Baier; Thomas Büchler; Enno Ohlebusch; Pascal Weber

This paper introduces the de Bruijn graph edge minimization problem, which is related to the compression of de Bruijn graphs: find the order-k de Bruijn graph with minimum edge count among all orders. We describe an efficient algorithm that solves this problem. Since the edge minimization problem is connected to the BWT compression technique called tunneling, the paper also describes a way to minimize the length of a tunneled BWT in such a way that useful properties for sequence analysis are preserved. Although not being a complete solution, this is a significant and practically usable progress on the open problem of finding optimal disjoint blocks that minimize space, as stated in Alanko et al. (DCC 2019).

更新日期：2020-01-13
• arXiv.cs.DS Pub Date : 2020-01-08
Alice Paul; David P. Williamson

In this note, we consider the capacitated facility location problem when the transportation costs of the instance satisfy the Monge property. We show that a straightforward dynamic program finds the optimal solution when the demands are polynomially bounded. When demands are not polynomially bounded, we give a fully polynomial-time approximation scheme by adapting an algorithm and analysis of Van Hoesel and Wagelmans.

更新日期：2020-01-10
• arXiv.cs.DS Pub Date : 2020-01-09
Nate Veldt; Austin R. Benson; Jon Kleinberg

The minimum $s$-$t$ cut problem in graphs is one of the most fundamental problems in combinatorial optimization, and graph cuts underlie algorithms throughout discrete mathematics, theoretical computer science, operations research, and data science. While graphs are a standard model for pairwise relationships, hypergraphs provide the flexibility to model multi-way relationships, and are now a standard model for complex data and systems. However, when generalizing from graphs to hypergraphs, the notion of a "cut hyperedge" is less clear, as a hyperedge's nodes can be split in several ways. Here, we develop a framework for hypergraph cuts by considering the problem of separating two terminal nodes in a hypergraph in a way that minimizes a sum of penalties at split hyperedges. In our setup, different ways of splitting the same hyperedge have different penalties, and the penalty is encoded by what we call a splitting function. Our framework opens a rich space on the foundations of hypergraph cuts. We first identify a natural class of cardinality-based hyperedge splitting functions that depend only on the number of nodes on each side of the split. In this case, we show that the general hypergraph $s$-$t$ cut problem can be reduced to a tractable graph $s$-$t$ cut problem if and only if the splitting functions are submodular. We also identify a wide regime of non-submodular splitting functions for which the problem is NP-hard. We also analyze extensions to multiway cuts with at least three terminal nodes and identify a natural class of splitting functions for which the problem can be reduced in an approximation-preserving way to the node-weighted multiway cut problem in graphs, again subject to a submodularity property. Finally, we outline several open questions on general hypergraph cut problems.

更新日期：2020-01-10
Contents have been reproduced by permission of the publishers.

down
wechat
bug