Abstract
The pattern matching problem remains in survival since past decades and becomes more sophisticated due to exponential increase in size of text databases. An effective deterministic classical algorithm is always expected to be at least \({\rm O}\left( N \right)\) time. Quantum computations are enough capable of performing exponential operations in single step of execution, so the quantum algorithms are effective. In general, the quantum pattern matching solution is possible in \({\rm O}\left( {\sqrt N } \right)\) time as its design is based on Grover’s quantum search algorithm. To our knowledge, quantum algorithms for single pattern matching are available with limitations, and no algorithm has designed for multiple pattern matching. The main objective is to design quantum algorithm for both single and multiple patterns on a processing architecture of quantum random access memory \(\left( {QuRAM} \right)\). This gives a significant advantage to process large text databases in an efficient manner. Our complexity analysis justifies that the quantum algorithmic solutions achieve computational speedup over classical methods. We summarize the emergence of quantum-based pattern matching algorithms to process biological applications. The simulation is additionally done to validate and analyze the performance of proposed quantum algorithms. Lastly, we justify that our algorithms outperform the classical and quantum solutions and they are competent for implementing over quantum computer.
Similar content being viewed by others
References
Tao, T.; Mukherjee, A.: Pattern-Matching in LZW Compressed Files. IEEE Trans. On Computers 54(8), 929–938 (2005)
Das, S.; Kapoor, K.: Weighted approximate parameterized string matching. AKCE International Journal of Graphs and Combinatorics 14, 1–12 (2017)
Hakak, S.I.; Kamsin, A.: Exact String Matching Algorithms: Survey, Issues, and Future Research Directions. IEEE Access 7, 69614–69637 (2019)
Neamatollahi, P.; Hadi, M.; Naghibzadeh, M.: Simple and Efficient Pattern Matching Algorithms for Biological Sequences. IEEE Access 8, 23838–23846 (2020)
Faro, S.; Lecroq, T.: The Exact Online String Matching Problem: A Review of the Most Recent Results. ACM Comput. Surv. 45(2), 1–42 (2013)
Rivals, E.; Salmela, L.; Tarhio, J.: Exact Search Algorithm for Biological Sequences. Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, John Wiley and Sons, pp. 91–111 (2011).
Zou, D.; Ma, L.; Yu, J.; Zhang, Z.: Biological Databases for Human Research. Genomics Proteomics Bioinformatics 13, 55–63 (2015)
Kalsi, P.; Peltola, H.; Tarhio, J.: Comparison of Exact String Matching Algorithms for Biological Sequences. CCIS Springer 13, 417–426 (2008)
Knuth, D.E.: Morris; Pratt: Fast pattern matching in strings. SIAM Journal Computing 6, 323–350 (1977)
Boyer, R.S.; Moore, J.S.: A fast string searching algorithm. Communication of ACM 20, 762–772 (1977)
Aho, A.V.; Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Charalampos, S.; Panagiotis, D.; Konstantinos, G.: Parallel processing of multiple pattern matching algorithms for biological sequences: methods and performance results. In: Yang, N.S. (eds.) Bioinformatics – Computational Biology and Modeling, pp. 161–182. IntechOpen, London (2011)
Lin, C.-H.; Chein, L.-S.: Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs. IEEE Trans. On Computers 62(10), 1906–1916 (2013)
Nielsen, M.; Chuang, I.: Quantum Computation and Quantum Information, 10th edn. Cambridge University Press, Cambridge (2010)
Lov K. Grover.: A fast quantum mechanical algorithm for database search. In: Proceedings of ACM STOC 1996, pp. 212–219, ACM (1996).
Ramesh, H.; Vinay, V.: String Matching in O(√n + √m) quantum time. Elsevier Journal of Discrete Algorithms. 1, 103–110 (2003)
Mateus, P.: A Quantum Algorithm for Closest Pattern Matching. Int. J. of Theoretical Physics 52, 3970–3980 (2003)
Aborot, J.: Quantum Approximate String Matching for Large Alphabets. Theory and Practice of Computation, World Scientific 20, 37–50 (2017)
De Jesus, B.K.A.; Aborot, J.A.; Adorna, H.N.: Solving the Exact Pattern Matching Problem Constrained to Single Occurrence of Pattern P in String S Using Grover’s Quantum Search Algorithm. Theory and Practice of Computation, Springer Tokyo 7, 124–142 (2013)
Montanaro, Ashley: Quantum Pattern Matching Fast on Average. Springer Algorithmica 77, 16–39 (2017)
Giovannetti, Vittorio: Quantum random access memory. Phys. Rev. Lett. 100, 1–4 (2008)
Daniel, K.: Park; Francesco Petruccione: Circuit-Based Quantum Random Access Memory for Classical Data. Quantum Physics, Scientific Reports 9(3949), 1–8 (2019)
Marca Lanzogorta: Quantum Computer Science: Synthesis Lectures on Quantum Computing, E-Book (2008).
Chakrabarty, I.; Khan, S.; Singh, V.: Dynamic Grover Search: Application in Recommendation System and Optimization Problems. Quantum Info. Process 16, 152–172 (2017)
Mandviwalla, A.; Ohshiro, K.; Ji,B.: Implementing Grover’s Algorithm on the IBM Quantum Computers. In: Proceedings of International Conference on Big Data 2018, pp. 2531–2537, IEEE (2018).
Jones, T.: Benjamin C: QuEST and High Performance Simulation of Quantum Computers. Science Reports 9(10736), 1–9 (2018)
Hao, X.; Zhang, F.; Xla, S.; Zhou, Y.: Quantum Algorithms for Learning the Algebraic Normal Form of Quadratic Boolean Functions. Quantum Inf. Process. 19(273), 1–22 (2020)
Yu. I. Bogdanova; N. A. Bogdanova; D. V. Fastovets; V. F. Lukichev: Representation of Boolean Function in terms of Quantum Computations. In: International Conference on Micro – and Nano Electronics, pp. 1-19 (2018).
Acknowledgement
The authors are thankful to other researchers working in the same domain to share their ideas and \(QuEST\) quantum library that helps us validating proposed algorithms.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
In Table 8, we provide some important points for the contextual understanding of proposed algorithms in terms of advantages, limitations, and preferred biological applications. This summary is listed in reference to the Analysis Note 1 to Analysis Note 4 and Application Note 1 to Application Note 4. In perspective of quantum pattern matching algorithm design, we observed the general limitation that the expectation of search results with high probability can be affected with the increase in size of text databases. For all multiple pattern algorithm, we noted that their performance varies when the length of pattern varies (for the unequal sized pattern). However, on the basis of alphabet size, the different possible lengths of text and pattern are considered. Therefore, we provide the summarized detail of biological applications for which our algorithms are found suitable.
Appendix B
The experiments are performed using \(QuEST\) simulation for evaluating the average execution time and utilization of \(RAM\) workspace. Moreover, we recorded a standard log of \(QASM\) to observe the number of quantum gates applied to quantum registers during algorithmic simulation [26]. These details are noted in Table 9. The separate discussion over the identified results and observed facts are provided here through simulation notes.
Simulation Note 2. The evaluated outcomes which are noted in Table 9 assures that the number of solutions is correctly reported at each identified index \(|T_{i} \rangle_{AR} \in |T_{n} \rangle_{QuRAM}\). In Table 7, our expected qubits requirement and the observed qubits under \(QuEST\) simulation are completely mapped. To note the average outcomes of \(QuEST\) simulation, the experiment was repeated 20 times. For the triplets of text file size and different pattern lengths, we have noted an average execution time (in sec) and workspace utilization of \(RAM\) (in KiB). The execution time of \(QuEMS\_US\) decreases gradually on comparing \(QuEMS\_ES\) because of the second pattern length is shorter (unequal sized), and therefore, the average execution time of \(QuEMS\_US\) is decreased.
Similarly, we observed same fact between an average execution time of \(QuEMM\_US\) and \(QuEMM\_ES\) algorithms. We noted that the average execution time of \(QuEMM\) will be definitely more on comparing with \(QuEMS\) algorithms. This happens because of \(QuEMS\) searches only for single occurrence of each pattern and \(QuEMM\) searches for all occurrences of multiple pattern strings. However, we observed an exception that the average execution time of single pattern \(QuESM\) algorithm is found comparatively more than \(QuEMS\_US\) and \(QuEMM\_US\). We know that the \(QuEMS\_US\) algorithm searches only for the single occurrence of multiple pattern; therefore, the execution time can be less.
In contrast \(QuESM\) searches for all the occurrences of single pattern, so the average execution time may be more than that of \(QuEMS\_US\) algorithm. An increased deviation in the average execution time of \(QuESM\) over \(QuEMM\_US\) is an exception. However, we clarify that this may happen due to the random increase in depth of Boolean function which was realized for \(QuRAM\) through the \(ANF\). Our evaluations of average execution time over the quantum machine would be considered as negligible for small sized text and pattern.
Simulation Note 3. In Table 9, the utilization of classical \(RAM\) workspace (in KiB) was observed throughout the course of execution phase of simulation program. The consumption of memory space is noted in triplet of text file sizes, and the simple correlation is found between all executed algorithms. A memory utilization of (\(QuEMS\_US\) and \(QuEMM\_US\)) is lower than that of the (\(QuEMS\_ES\) and \(QuEMM\_ES\)) as because of the second pattern is of shorter length. However, the \(RAM\) requirement for \(QuEMM\) is found more than the \(QuEMS\) algorithm due to searching for all occurrences of multiple pattern strings. Besides, the single pattern \(QuESM\) is comparatively observed with slight less requirement of \(RAM\) among all executions. The classical \(RAM\) utilization to execute the implemented algorithms is observed within the closer regions of memory consumption.
There exist close similarities between the memory consumptions of our algorithms, instead of deviations. However, it is not an exception, rather this happens because of \(ANF\) is used to realize the unitary \(U_{LD}\) for \(QuRAM\). Since the Hadamard gates are used to realize the text indices in superposition and \(ANF\) builds the superposition of coherently correlated data, we expect slight variation in the depth of the Boolean functions, as functional outcome is dependent on the depth of at most \(O\left( {2^{n} } \right)\). So, classical memory consumption would be more, and therefore, the consistent memory space \(\left( {RAM} \right)\) is needed throughout the execution of programs.
Simulation Note 4. We recorded the log of standard quantum assembly \(\left( {QASM} \right)\) instructions, through which we observed the number of quantum gates used during simulation of quantum circuits. A tuple \(\left( {H, X, R_{z} \left( {\theta = 0} \right), C^{n - 1} Z, C^{k} NOT} \right)\) includes quantum gates which are noted during simulation. The rightmost column of Table 9 shows an individual gate count of algorithms as per the specified tuple. The gate count is observed for either single pattern \(P\) or the multiple pattern strings \(P1\) and \(P2\) (equal or unequal sizes). Each quantum gate is separately counted as in triplet of text file size (32, 128, 512). As expected, we observed proportional increase and decrease in the gate counts as per the given size binary encoded text file and binary encoded pattern(s).
We therefore noted, some exceptions in the standard log file of \(QASM\), as the equal sized multiple pattern \(QuEMS\_ES\) and \(QuEMM\_ES\) algorithms are realizing increase in the gate counts of \(R_{z}\) and \(C^{k} NOT\) gates (approximately double). This was observed between the patterns \(P1\) and \(P2\), only over the text file size of 512 characters. This is still happened due to the random increase in circuit depth of the Boolean function which was realized for \(QuRAM\) through the \(ANF\).
Throughout our discussion, we were concerned about the qubits requirement as it is exponentially proportional to requirement of classical memory and thus causes exponential increase in processing time. The \(QuEST\) performance is also dependent on algorithmic scaling with respect to the use of qubits and it is limited to underlying classical machine configuration. We observed that according to our machine configuration, quantum simulation is limited up to 25 qubits. So far, we implemented the algorithms \(QuESM\), \(QuEMS\) and \(QuEMM\) as they were found feasible for simulation. However, Table 10 shows that \(QuAMM\) algorithm needs an implementation of unitary \(U_{HD}\) for approximate match. This design has the same qubit complexity as we theoretically noted for the \(APM\) algorithm (Table 4). Indeed, due to the excessive multiplicative constants, higher number of qubits are required; therefore, the simulation of \(QuAMM\) algorithms is infeasible. For a triplet of file size and assumed patterns, we list the expected qubits requirement. “Expected Qubits” of Table 10 shows that the requirement of qubits are higher than the capability of our machine.
Appendix C
Our presented algorithms are implemented through \(QuEST\) library to validate quantum algorithms. A Genome Sequence file of Hot Pepper (Capsicum Annuum) is available at the link (http://plants.ensembl.org/Capsicum_annuum/Info/Index). The dataset of subset genome and the \(QuEST\)-specific simulation codes are uploaded at github.com. All the algorithm codes are publicly available and can be accessed through the referenced link (https://github.com/profkapilsoni/QuQPMA).
Rights and permissions
About this article
Cite this article
Soni, K.K., Malviya, A.K. Design and Analysis of Pattern Matching Algorithms Based on QuRAM Processing. Arab J Sci Eng 46, 3829–3851 (2021). https://doi.org/10.1007/s13369-020-05310-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-020-05310-y