Introduction

Pattern matching is one of the core algorithms in computer science that stand to benefit from quantum computers1,2. Pattern matching algorithms are used ubiquitously used in image processing3,4, the study of DNA sequences5, and data compression and statistics6, to name a few. Thus, accelerating pattern matching using a quantum computer would be a boon to all these areas.

The simplest form of pattern matching is string matching. In string matching, given a long string \({\mathcal{T}}\) of length N, we search for a pattern \({\mathcal{P}}\) of length M with Mā€‰ā‰¤ā€‰N7. Depending on the application, we may need to search for an exact match or a fuzzy match, or a match with some wildcards8.

The best known classical algorithm for string matching is the Knuth-Pratt-Morris algorithm, which has the worst-case time complexity of Ī˜(Nā€‰+ā€‰M)9,10. The best-known algorithms for approximate string matching have a similar run-time of Ī˜(Nā€‰+ā€‰M). For random strings, the exact matching complexity is lower bounded by \({{\Omega }}((N/M)\mathrm{log}\,(M))\)11.

Ramesh and Vinay developed an exact string matching quantum algorithm with a query complexity of \(\tilde{O}(\sqrt{N}+\sqrt{M})\)1. This algorithm uses Groverā€™s search to identify the position at which a segment of length M from \({\mathcal{T}}\) matches the pattern \({\mathcal{P}}\), where each of the checks is done using a nested Grover search. However, this work does not construct explicit oracles required and the total time complexity, measured in units of gate depth, is bound to increase once we account for the gate-level complexity of accessing the text and pattern from a database. Another approach that relies on a quantum solver for the dihedral hidden subgroup problem12 has a time complexity of \(\tilde{O}({(N/M)}^{1/2}{2}^{O(\sqrt{{\mathrm{log}}\,(M)})})\) for average-case matching13. This work also assumes that M is larger than the logarithm of the length N, i.e \(M=\omega (\mathrm{log}\,N)\) and fails with a high probability for certain worst-case inputs. In our work, we do not make any assumptions on the length of pattern or the distribution of inputs.

In this paper, we present a string-matching algorithm, based on generalized Groverā€™s amplitude amplification14, with a time complexity of \(\tilde{O}(\sqrt{N})\) for arbitrary text length N and pattern length Mā€‰ā‰¤ā€‰N. Note our algorithm does not rely on a quantum database, incurring no initialization overhead of the database, expected to be O(N), that would overshadow any quantum advantage. The techniques we develop for our algorithm can readily be extended to solve pattern matching problems in higher dimensions. Over the course of detailing each step of our algorithm, we also ensure to provide a gateā€“byā€“gate level instruction to construct relevant quantum circuits. This allows us to straightforwardly obtain a concrete estimate of the total gate counts. The gate counts we report help us establish contexts as to when we may expect quantum computers to be of help in the problem space of pattern matching.

Our paper is organized as follows. To motivate the readers, we first compare our main results that are derived in the remainder of the paper with the current state of the art. After the comparison, we provide an outline of our string-matching algorithm. In Section ā€œResultsā€, we provide the details of the algorithm, including the explicit circuits for all necessary oracles. We then calculate the overall complexity of our algorithm. We provide an estimate for gate counts in terms of CNOT and T gates, useful for pre-fault tolerant and fault tolerant regimes, respectively. We summarize our paper in Section ā€œDiscussionā€ and discuss the implications of our results.

We start by pointing out that our work differs from13, where the algorithm therein targets an average case input, in that we, as in1, provide a quantum algorithm for pattern matching for the worst case inputs. The work in13 further assumes \(M=\omega (\mathrm{log}\,(N))\), whereas the work in1 and the work reported in this manuscript do not. We rely on a Grover oracle (see Section ā€œGrover oracleā€) that simply checks if a state is an all-zero state in the computational basis, whereas the oracles in refs 1,13 are random memory access oracles of the form \({\sum }_{i}\left|i\right\rangle \left|0\right\rangle \to {\sum }_{i}\left|i\right\rangle \left|{t}_{i}\right\rangle\) where ti is the ith bit of a text. As such, we are unaware of an efficient quantum circuit that implements the oracle (see Section G.4 of the appendix of ref. 15 for the best-known construction) without resorting to quantum random access memory (QRAM)16. The known blueprints for QRAM16 have polylogarithmic time complexity in the size of memory to be accessed. In our case, the size of memory is O(N) and, therefore, QRAM queries will incur at additional multiplicative cost of at least \(O({(\mathrm{log}\,N)}^{2})\). Moreover, we would also have to account for the cost of initializing the quantum memoryā€”this is expected to take a number of operations linear in N17. In contrast, our algorithm does not assume any random access oracles. We also provide an explicit circuit for the Grover oracle we need using elementary quantum gates, specifically single-qubit Clifford, T, and CNOT gates.

Note the algorithm in ref. 13 fails with a probability O(1/N) over the choice of \({\mathcal{T}}\) and \({\mathcal{P}}\). For certain worst-case \({\mathcal{T}}\) and \({\mathcal{P}}\), the algorithm inherently fails to return a match. In addition, there is internal randomness in the algorithm which contributes to an additional probability of failure. Our work also fails with probability O(1/N) if there is a match between \({\mathcal{T}}\) and \({\mathcal{P}}\), but this is purely due to the internal randomness of Groverā€™s algorithm. We can simply repeat the algorithm to suppress the failure probability to be arbitrarily small, with the average repetition number of N/(Nā€‰āˆ’ā€‰1). We make no assumptions on the distribution of text and pattern and the algorithm works for all possible inputs. This may be contrasted to the impossibility to suppress the failure probability by repeated use of the algorithm for the worst-case inputs in ref. 13.

Our algorithm has a space complexity of O(Nā€‰+ā€‰M) since we need N (M) qubits to store the text (pattern). With Nā€‰>ā€‰M, we may omit the M dependence and simplify it to O(N). The space complexities of1,13 depend on the space complexity of the oracle. Assuming an N-bit register containing the text to be searched over is prepared in QRAM, in the bucket-brigade model, the bulk of the space complexity comes from routing qutrits, where random access over N bits of information requires O(N) routing qutrits. Expending a constant number of qubits for each qutrit, the space complexities of1,13 are Ī©(N), and likely Ī˜(N).

Finally, unlike the two prior works, the simplicity of our algorithm allows us to not just provide an explicit circuit-level blueprint for the algorithm but also estimate the quantum resources needed to implement it. A summary of the comparison between our work and1,13 is given in Table 1.

Table 1 Comparison of our work with prior algorithms discussed in this paper.

In the remainder of this section, we outline the steps of our algorithm. The detailed implementation is presented in Section ā€œResultsā€.

  1. 1.

    Initialize two quantum registers to

    $$\left|{t}_{0}{t}_{1}{t}_{2}\ldots {t}_{N-1}\right\rangle \left|{p}_{0}{p}_{1}\ldots {p}_{M-1}\right\rangle ,$$

    where ti and pi denote the ith bit of string \({\mathcal{T}}\) and pattern \({\mathcal{P}}\), respectively.

  2. 2.

    Transform the first register containing the string \({\mathcal{T}}\) into a superposition of N states, where each state is a bit-shifted state of the original state of the first register, shifted by 0, 1, 2..., Nā€‰āˆ’ā€‰1 bits. This results in, assuming modulo-N space for the bit indices,

    $$\left(\frac{1}{\sqrt{N}}\mathop{\sum }\limits_{k = 0}^{N-1}\left|{t}_{0+k}{t}_{1+k}{t}_{2+k}\ldots {t}_{N-1+k}\right\rangle \right)\left|{p}_{0}{p}_{1}\ldots {p}_{M-1}\right\rangle$$
    (1)
  3. 3.

    Compute XOR between the first M bits of the first register and all M bits of the second register to obtain

    $$\begin{array}{ll}&\frac{1}{\sqrt{N}}{\mathop{\sum}\limits_{k}}\left|{t}_{0+k}{t}_{1+k}\ldots {t}_{N-1+k}\right\rangle \\ &\left|({p}_{0}\oplus {t}_{0+k})({p}_{1}\oplus {t}_{1+k})\ldots ({p}_{M-1}\oplus {t}_{M-1+k})\right\rangle .\end{array}$$
    (2)
  4. 4.

    The second register is all zeros if the pattern matches with the first M bits of \({\mathcal{T}}\). The register contains d ones if the string and the pattern differ in d bit positions.

  5. 5.

    Use the generalized Grover search or amplitude amplification14 to isolate the state where the second register has all zeros (when searching for exact match) or has fewer than D matches (in the case of fuzzy search).

Results

In this section, we lay out the detailed implementation of the algorithm we outlined above. Specifically, we detail the transformations and registers used to implement the algorithm. One of the central transformations to be used in our algorithm is the cyclic shift operator. We present the details of its construction in Section ā€œConstruction of the cyclic-shift operator.ā€ We also present the construction of the necessary Grover oracle in Section ā€œGrover oracleā€ for completeness.

To encode a binary string \({\mathcal{T}}\) of length N and a binary pattern \({\mathcal{P}}\) of length M, we use quantum registers of N and M qubits, respectively. This can be done by using identity and bit-flip gates on a quantum register initialized as \({\left|0\right\rangle }^{\otimes (N+M)}\). Denoting the encoded states as

$$\begin{array}{ll}&\left|{\mathcal{T}}\right\rangle =\left|{t}_{0}{t}_{1}\ldots {t}_{N-1}\right\rangle =\mathop{\bigotimes}\limits_{i = 0}^{N-1}\left|{t}_{i}\right\rangle ,\\ &\left|{\mathcal{P}}\right\rangle =\left|{p}_{0}{p}_{1}\ldots {p}_{M-1}\right\rangle =\mathop{\bigotimes}\limits_{j = 0}^{M-1}\left|{p}_{j}\right\rangle ,\end{array}$$
(3)

where ti (pi) is the ith bit of string \({\mathcal{T}}\) (\({\mathcal{P}}\)), together with an index register of n qubits in the zero states, we prepare on a quantum computer a composite initial state

$$\left|\psi \right\rangle ={\left|0\right\rangle }^{\otimes n}\left[\mathop{\bigotimes}\limits_{i = 0}^{N-1}\left|{t}_{i}\right\rangle \right]\left[\mathop{\bigotimes}\limits_{j = 0}^{M-1}\left|{p}_{j}\right\rangle \right],$$
(4)

where, for convenience, we assumed Nā€‰=ā€‰2n. Next, we apply an n-qubit Hadamard transform HāŠ—n (or a Fourier transform in case of Nā€‰ā‰ ā€‰2n for \(n\in {\mathbb{N}}\)) on the index register to produce a uniform superposition of \(\left|0\right\rangle ,\left|1\right\rangle ,\ldots \left|N-1\right\rangle\), i.e.,

$$\begin{array}{ll}&\left({H}^{\otimes n}{\left|0\right\rangle }^{\otimes n}\right)\left[\mathop{\bigotimes}\limits_{i = 0}^{N-1}\left|{t}_{i}\right\rangle \right]\left[\mathop{\bigotimes}\limits_{j = 0}^{M-1}\left|{p}_{j}\right\rangle \right]=\left(\frac{1}{\sqrt{N}}\mathop{\sum }\limits_{k = 0}^{N-1}\left|k\right\rangle \right)\left[\mathop{\bigotimes}\limits_{i = 0}^{N-1}\left|{t}_{i}\right\rangle \right]\left[\mathop{\bigotimes}\limits_{j = 0}^{M-1}\left|{p}_{j}\right\rangle \right].\end{array}$$
(5)

We now apply a cyclic shift operator \({\mathcal{S}}\) that left-circular shifts the qubits of the target state by k positions, where the values of k are encoded in the control state (see Section ā€œConstruction of the cyclic-shift operatorā€ for details). Applying \({\mathcal{S}}\) on the first two registers results in

$$\begin{array}{ll}&\left[{\mathcal{S}}\left(\frac{1}{\sqrt{N}}\mathop{\sum}\limits_{k = 0}^{N-1}\left|k\right\rangle \right)\left(\mathop{\bigotimes}\limits_{i = 0}^{N-1}\left|{t}_{i}\right\rangle \right)\right]\left(\mathop{\bigotimes}\limits_{j = 0}^{M-1}\left|{p}_{j}\right\rangle \right)\\ &=\frac{1}{\sqrt{N}}\mathop{\sum}\limits_{k = 0}^{N-1}\left|k\right\rangle \left(\mathop{\bigotimes}\limits_{i = 0}^{N-1}\left|{t}_{i+k}\right\rangle \right)\left(\mathop{\bigotimes}\limits_{j = 0}^{M-1}\left|{p}_{j}\right\rangle \right).\end{array}$$
(6)

At this point, we check for the match between the cyclically-shifted text strings in the second register and the pattern string stored in the third register. We use an XOR operation between each of the first M bits of the second register with each of the M bits of the third register. For instance, if the XOR results are all zeros, the strings match. With the help of CNOT gates on a quantum computer then, we obtain, with an abuse of notation,

$$\begin{array}{ll}&\frac{1}{\sqrt{N}}\mathop{\sum}\limits_{k = 0}^{N-1}\left|k\right\rangle {\text{CNOT}}^{\otimes M}\left[\left(\mathop{\bigotimes}\limits_{i = 0}^{N-1}\left|{t}_{i+k}\right\rangle \right)\left(\mathop{\bigotimes}\limits_{j = 0}^{M-1}\left|{p}_{j}\right\rangle \right)\right]\\ &=\frac{1}{\sqrt{N}}\mathop{\sum}\limits_{k = 0}^{N-1}\left[\left|k\right\rangle \left(\mathop{\bigotimes}\limits_{i = 0}^{N-1}\left|{t}_{i+k}\right\rangle \right)\left(\mathop{\bigotimes}\limits_{j = 0}^{M-1}\left|{p}_{j}\oplus {t}_{j+k}\right\rangle \right)\right].\end{array}$$
(7)

The final register, to this end, contains the number of mismatches between the pattern and the first M bits of the string register. Indeed, it is all zero if and only if those two string segments match completely.

We may now use the generalized Grover search or amplitude amplification14 to search for the state where the pattern register is in \(\left|0\right\rangle\) state (in the case of exact search). If this state is found, we know that the pattern occurs in the string. We also obtain the position from the index register where this match occurs. In addition to the exact match, we can also use this method to search for fuzzy matches or matches with wildcards by constructing appropriate Grover oracles.

Construction of the cyclic-shift operator

In this subsection, we explicitly construct a circuit that implements the cyclic-shift operator \({\mathcal{S}}\). The two-register operator \({\mathcal{S}}\) is defined according to

$${\mathcal{S}}\left[\left|k\right\rangle \mathop{\bigotimes}\limits_{i=0}^{N-1}\left|{t}_{i}\right\rangle \right]=\left[\left|k\right\rangle \mathop{\bigotimes}\limits_{i=0}^{N-1}{{\mathcal{S}}}_{k}\left|{t}_{i}\right\rangle \right]=\left[\left|k\right\rangle \mathop{\bigotimes}\limits_{i=0}^{N-1}\left|{t}_{i+k}\right\rangle \right].$$
(8)

To implement the k-controlled circular shift operator Sk, we consider k in its binary encoded form \(\left|k\right\rangle\) as \(\left|{k}_{0}\right\rangle \left|{k}_{1}\right\rangle \ldots \left|{k}_{n-1}\right\rangle\), such that 20k0ā€‰+ā€‰21k1ā€‰+ā€‰ā€¦ā€‰+ā€‰2nāˆ’1knāˆ’1ā€‰=ā€‰k. The circular bitwise rotation by k in the second register can then be implemented by a product of controlled-shift operators that shifts the target qubits by 2j bits, conditioned on the kjth qubit. Using \({{\mathcal{S}}}_{a}{{\mathcal{S}}}_{b}={{\mathcal{S}}}_{a+b}\), we may now write

$$\left|k\right\rangle \mathop{\bigotimes}\limits_{i=0}^{N-1}{{\mathcal{S}}}_{k}\left|{t}_{i}\right\rangle =\left(\mathop{\bigotimes}\limits_{j=0}^{n-1}\left|{k}_{j}\right\rangle \right)\mathop{\bigotimes}\limits_{i=0}^{N-1}\left(\mathop{\prod }\limits_{j=0}^{n-1}{{\mathcal{S}}}_{{2}^{j}}^{({k}_{j})}\right)\left|{t}_{i}\right\rangle .$$
(9)

where \({{\mathcal{S}}}_{{2}^{j}}^{({k}_{j})}\) applies a shift of 2j bits on the second register, which encodes the text \({\mathcal{T}}\), controlled by the jth qubit of the index register \(\left|k\right\rangle\). The circuit decomposition of this as a visual guide is shown in Fig. 1.

Fig. 1: Circuit diagram for circular bitwise rotation operator Sk.
figure 1

A shift by k bits can be achieved by a product of \({\mathrm{log}}\,(k)\) controlled shift operations.

The decomposition shown in (9) reveals that, together with (8), it suffices to now consider the controlled bit-shift operators \({S}_{{2}^{j}}^{(c)}\) that circular shifts by 2j bits for some j conditioned on qubit c to implement the cyclic-shift operator \({\mathcal{S}}\). To this end, in order to construct the circuit for \({S}_{{2}^{j}}^{(c)}\), we first consider an operator \({S}_{{2}^{j}}\) without any controls, which, as we show below, can be implemented using SWAP gates. We later promote the swap gates to a controlled version, effectively replacing the SWAP gates with controlled-SWAP (Fredkin) gates.

A circular shift operator Ss by s bits applies a permutation Ps, in modulo N space, of the form

$${P}_{s}=\{N-s,N-s+1,N-s+2,\ldots ,N-s-1\},$$
(10)

where the Nā€‰āˆ’ā€‰sth bit is inserted in the zeroth position, Nā€‰āˆ’ā€‰sā€‰+ā€‰1th bit is inserted in the first position, and so on. Any such permutation can be decomposed into a product of transpositions. As a result, a circular shift operation of the form (9) can be decomposed into a product of SWAP operations.

We now calculate how many SWAP-operation layers are needed to efficiently apply the permutation of the form (10). With a register with N qubits, we can apply N/2 SWAP operations in parallel. Using the N/2-parallel SWAP operator, we can move N/2 qubits to their right positions in a single time step. This leaves us with sorting the remainder of N/2 bits. At each subsequent time step, the number of qubits that need to be swapped decreases by half. Therefore, we can arbitrarily permute N qubits in \(O(\mathrm{log}\,(N))\) time steps using parallel SWAP operations. A sample diagrammatic representation of this unitary operation is shown in Fig. 2. This implies that each of the controlled shift operators \({S}_{{2}^{j}}^{{k}_{j}}\) in (9) can be achieved in \(O(\mathrm{log}\,(N))\) time steps using parallel controlled-SWAP operators.

Fig. 2: A diagrammatic representation of the circular shift operator.
figure 2

In this example, we left circular shift a register of 8 qubits by 6 positions within two-time steps. This kind of operation can in general be performed in depth \(\mathrm{log}\,(N)-1\) using parallel SWAP operations, where N is the size of the qubit register.

We next discuss a method to apply as many as N/2 parallel swap operations, controlled on the same qubit in the index register. As shown below, we achieve this at the cost of N/2 clean ancilla qubits.

We start by considering a fan-out CNOT operation, acting on the control qubit in a state \(\left|{k}_{j}\right\rangle\) and N/2 clean ancilla qubits initialized to \(\left|0\right\rangle\) as targets. This results in N/2 copies of \(\left|{k}_{j}\right\rangle\), which can then be used to implement up to N/2 Fredkin gates in a single time step. Once all necessary Fredkin gates have been implemented, we undo the fan-out operation and return all ancilla qubits to \(\left|0\right\rangle\) states. We recycle the freed-up ancilla qubits for the subsequent control qubits, one at a time.

The time cost of the fan-out operation is \(O(\mathrm{log}\,(N))\). Since there are \(O(\mathrm{log}\,(N))\) parallel SWAP layers required for the implementation of the qubit permutation discussed in Section ā€œConstruction of the cyclic-shift operatorā€, the overall time complexity of \({S}_{{2}^{j}}^{(c)}\) is \(O(\mathrm{log}\,(N))\).

Grover oracle

To complete our algorithm, we need a Grover oracle Uw that acts on the pattern register, required to amplify and help identify exact matches or close matches. The oracle may be defined according to

$${U}_{w}\left|{x}_{0}{x}_{1}\ldots {x}_{M-1}\right\rangle =\left\{\begin{array}{ll}-\left|{x}_{0}{x}_{1}\ldots {x}_{M-1}\right\rangle \ &\mathop{\sum }\limits_{i = 0}^{M-1}{x}_{i}\le d,\\ +\left|{x}_{0}{x}_{1}\ldots {x}_{M-1}\right\rangle \ &\mathop{\sum }\limits_{i = 0}^{M-1}{x}_{i}> d,\end{array}\right.$$
(11)

where d is zero if we desire to find exact matches and a small number if we desire to find close matches. Assuming an architecture that has long-range interactions, we can obtain this oracle in \(O({\mathrm{log}}\,(M))\) depth using O(M) ancilla qubits. We note in passing that there have also been proposals to implement a single-step n-control Toffoli that takes O(1) time in trapped-ion and neutral-atom architectures18. For the remainder of the paper, however, we take the circuit-depth complexity of this oracle to be \(O({\mathrm{log}}\,(M))\).

Time complexity

In this subsection, we compute the time complexity of our algorithm. Encoding of strings \({\mathcal{T}}\) and \({\mathcal{P}}\) takes O(1) time. The Hadamard transformation applied to the index register takes O(1) time as well. The cyclic-shift operator \({\mathcal{S}}\) takes time \(O({({\mathrm{log}}\,(N))}^{2})\), since each \({S}_{{2}^{j}}^{({k}_{j})}\) operator, including the fan-out and its uncompute operation, takes \(O({\mathrm{log}}\,(N))\) time and \(j=0,1,2,...,{\mathrm{log}}\,(N)-1\). The evaluation of XOR results via CNOT gates takes time O(1), as it admits a straightforward parallel operation. Lastly, the Grover oracle has the complexity \(O(\mathrm{log}\,(M))\). The overall complexity of the steps considered so far, a single Grover step, is then \(O({({\mathrm{log}}\,(N))}^{2}+{\mathrm{log}}\,(M))\).

For the Grover search to be successful, we need to repeat the Grover steps \(O(\sqrt{N})\) times. This brings the total complexity to \(O(\sqrt{N}({({\mathrm{log}}\,(N))}^{2}+{\mathrm{log}}\,(M)))\).

Space complexity

In addition to the N and M qubits needed to encode the search string and the pattern, we need \(O(\mathrm{log}\,(N))\) qubits for the index register. For the depth-optimized implementation of our algorithm we need N/2 ancilla qubits for the index register. Furthermore, O(M) ancilla qubits are required for the depth-optimized Grover oracle implementation. Therefore, the space complexity of our string-matching algorithm is O(Nā€‰+ā€‰M).

Gate counts

In this section, we obtain an estimate for the gate count in terms of CNOT and T gates. We chose the two gates as metrics since it is widely expected that two-qubit gates, such as CNOT, are expected to dominate the cost of implementation in the pre-fault tolerant regime, whereas T gates are expected to dominate the cost of implementation in the fault-tolerant regime, assuming the standard gate set of Cliffordā€‰+ā€‰T.

The strings \({\mathcal{T}}\) and \({\mathcal{P}}\) can be encoded in qubits initially in \(\left|0\right\rangle\) state using only identity and bit-flip(X) gates and thus the encoding step has zero cost. A Hadamard transform of the index register in (5) needs \(\mathrm{log}\,(N)\) Hadamard gates, requiring zero cost as well. The cyclic shift operator \({\mathcal{S}}\) in (6) consists of \(\mathrm{log}\,(N)\) applications of \({S}_{s}^{(c)}\) operators. Each \({S}_{s}^{(c)}\) operator consists of a CNOT fan-out to N/2ā€‰āˆ’ā€‰1 target qubits, its inverse, and at most Nā€‰āˆ’ā€‰1 Fredkin gates, since the permutation specified in (10) of size as large as N can be decomposed into at most Nā€‰āˆ’ā€‰1 transpositions. As shown explicitly in Supplementary Note 1, based on circuit identities reported in refs 19,20, each Fredkin gate costs 7 CNOT gates and 7 T gates. Thus the cyclic shift operator costs at most \((8N-9)\mathrm{log}\,(N)\) CNOT gates and \([7(N-1)]\mathrm{log}\,(N)\) T gates. Next, the XOR operation in (7) takes M CNOT gates. Lastly, the Grover oracle of (11), using a parallelized version of the results reported in21 (see Supplementary Note 2 for details), can be implemented with 6Mā€‰āˆ’ā€‰12 CNOT gates and 8Mā€‰āˆ’ā€‰17 T gates with a linear overhead in ancilla upper bounded by Mā€‰āˆ’ā€‰3.

Finally, we need to repeat this \(\sqrt{N}\) times for amplitude amplification. The total CNOT and T count is, thus, given by

$$\begin{array}{ll}\#\,{\mathrm{CNOT}}\,=(7M-12+(8N-9){\mathrm{log}}\,(N))\times 2\sqrt{N},\\ \# \,{\mathrm{T}}\,=(8M-17+7(N-1){\mathrm{log}}\,(N))\times 2\sqrt{N},\end{array}$$
(12)

where the factor of 2 comes from the fact that for amplitude amplification, we need to apply a unitary to produce a state \(\left|\psi \right\rangle =U\left|0\right\rangle\) and also the inverse unitary Uā€ .

Based on (12), we see that searching for a pattern with 20 ASCII characters (or 160 bits) in a text file that is 1 MB long would require about 1013 CNOT and T gates. Similarly, searching for a kilobyte-long pattern of a genetic signature in a genome sequence of 1 GB would require more than 1017 CNOT and T gates. We expect classical computers to outperform quantum computers for datasets of such length. However, for applications like matching templates in data generated by gravitational-wave experiments which may be petabytes long (matching a megabyte-long signature in the petabyte-long text would require 1025 CNOT and T gates), we may expect to see the quantum advantage.

Discussion

In this paper, we have constructed a quantum string-matching algorithm that admits a circuit-depth complexity of \(O(\sqrt{N}({(\mathrm{log}\,(N))}^{2}+\mathrm{log}\,(M)))\). We also provide an explicit gate-level implementation of our algorithm, enabling a concrete estimate of quantum resources needed. The direct use cases of the matching algorithm range from a simple text search in a large file to detecting patterns in an image. The simple matching procedure can help, for example, in making intelligent recommendations based on pictures in a consumer device22, detecting defects in industrial lithography23, detecting signals in large time-series data collected in experiments like the Laser Interferometer Gravitational-Wave Observatory24, etc. In these applications, the typical size of data to be searched varies between ~106 and ~1015 bytes. Our algorithm admits processing of such data size in time steps \(\sim {\mathcal{C}}\times {({\mathrm{log}\,}_{2}(N))}^{2}\sqrt{N}\), where \({\mathcal{C}}\, <\,20\) and N is the number of bits in the data. We hope the speed-up provided by the quantum algorithm contributes to further advances in these areas.