skip to main content
research-article

Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation

Published:13 November 2017Publication History
Skip Abstract Section

Abstract

Regular Expressions (REs) are ubiquitous in database and programming languages. While many applications make use of REs extended with interleaving (shuffle) and unordered concatenation operators, this extension badly affects the complexity of basic operations, and, especially, makes membership NP-hard, which is unacceptable in most practical scenarios.

In this article, we study the problem of membership checking for a restricted class of these extended REs, called conflict-free REs, which are expressive enough to cover the vast majority of real-world applications. We present several polynomial algorithms for membership checking over conflict-free REs. The algorithms are all polynomial and differ in terms of adopted optimization techniques and in the kind of supported operators. As a particular application, we generalize the approach to check membership of Extensible Markup Language trees into a class of EDTDs (Extended Document Type Definitions) that models the crucial aspects of DTDs (Document Type Definitions) and XSD (XML Schema Definitions) schemas.

Results about an extensive experimental analysis validate the efficiency of the presented membership checking techniques.

References

  1. Carlos Buil Aranda, Marcelo Arenas, Óscar Corcho, and Axel Polleres. 2013. Federating queries in SPARQL 1.1: Syntax, semantics, and evaluation. J. Web Sem. 18, 1 (2013), 1--17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrey Balmin, Yannis Papakonstantinou, and Victor Vianu. 2004. Incremental validation of XML documents. ACM Trans. Database Syst. 29, 4 (2004), 710--751. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Denilson Barbosa, Gregory Leighton, and Andrew Smith. 2006. Efficient incremental validation of XML documents after composite updates. In Proceedings of XML Database Symposium XSym (Lecture Notes in Computer Science), Vol. 4156. Springer, 107--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Denilson Barbosa, Alberto O. Mendelzon, Leonid Libkin, Laurent Mignet, and Marcelo Arenas. 2004. Efficient incremental validation of XML documents. In Proceedings of the 20th International Conference on Data Engineering (ICDE’04). IEEE Computer Society, 671--682. Google ScholarGoogle ScholarCross RefCross Ref
  5. Geert Jan Bex, Frank Neven, and Jan Van den Bussche. 2004. DTDs versus XML schema: A practical study. In Proceedings of the 7th International Workshop on the Web and Databases (WebDB’04), Colocated with ACM SIGMOD/PODS 2004. 79--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Geert Jan Bex, Frank Neven, Thomas Schwentick, and Stijn Vansummeren. 2010. Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. 35, 2 (2010), 11:1--11:47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Geert Jan Bex, Frank Neven, and Stijn Vansummeren. 2007. Inferring XML schema definitions from XML data. In Proceedings of the Conference on Very Large Data Bases (VLDB’07). 998--1009.Google ScholarGoogle Scholar
  8. Henrik Björklund, Wim Martens, and Thomas Timm. 2015. Efficient incremental evaluation of succinct regular expressions. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM’15). ACM, 1541--1550.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Iovka Boneva, Radu Ciucanu, and Slawek Staworko. 2013. Simple schemas for unordered XML. In Proceedings of the 16th International Workshop on the Web and Databases 2013 (WebDB’13). 13--18.Google ScholarGoogle Scholar
  10. Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, and John Cowan. 2006. Extensible Markup Language (XML) 1.1 (2nd ed.). Technical Report. World Wide Web Consortium. W3C Recommendation.Google ScholarGoogle Scholar
  11. Anne Brüggemann-Klein. 1993. Unambiguity of extended regular expressions in SGML document grammars. In Proceedings of the 1st Annual European Symposium on Algorithms (ESA’93), Bad Honnef, Germany, September 30--October 2, 1993 (Lecture Notes in Computer Science), Vol. 726. Springer, 73--84.Google ScholarGoogle Scholar
  12. Anne Brüggemann-Klein and Derick Wood. 1992. Deterministic regular languages. In Proceedings of the 9th Annual Symposium on Theoretical Aspects of Computer Science (STACS’92). 173--184.Google ScholarGoogle ScholarCross RefCross Ref
  13. Anne Brüggemann-Klein and Derick Wood. 1998. One-unambiguous regular languages. Inf. Comput. 142, 2 (1998), 182--206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Janusz A. Brzozowski. 1964. Derivatives of regular expressions. J. ACM 11, 4 (1964), 481--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Byron Choi. 2002. What are real DTDs like? In Proceedings of the International Workshop on Web and Databases (WebDB’02). 43--48.Google ScholarGoogle Scholar
  16. Dario Colazzo, Giorgio Ghelli, Luca Pardini, and Carlo Sartiani. 2009. Linear inclusion for XML regular expression types. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM’09), Hong Kong, China, November 2--6, 2009. ACM, 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dario Colazzo, Giorgio Ghelli, Luca Pardini, and Carlo Sartiani. 2013. Almost-linear inclusion for XML regular expression types. ACM Trans. Database Syst. 38, 3 (2013), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dario Colazzo, Giorgio Ghelli, Luca Pardini, and Carlo Sartiani. 2013. Efficient asymmetric inclusion of regular expressions with interleaving and counting for XML type-checking. Theor. Comput. Sci. 492 (2013), 88--116. Google ScholarGoogle ScholarCross RefCross Ref
  19. Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. 2009. Efficient asymmetric inclusion between regular expression types. In Proceedings of the ACM International Conference Proceeding Series (ICDT’09), Ronald Fagin (Ed.), Vol. 361. ACM, 174--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dario Colazzo, Giorgio Ghelli, and Carlo Sartiani. 2009. Efficient inclusion for a class of XML types with interleaving and counting. Inf. Syst. 34, 7 (2009), 643--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Silvano Dal-Zilio and Denis Lugiez. 2003. XML schema, tree logic and sheaves automata. In , Proceedings of the 14th International Conference on Rewriting Techniques and Applications (RTA’03). Springer, 246--263. Google ScholarGoogle ScholarCross RefCross Ref
  22. David C. Fallside and Priscilla Walmsley. 2004. XML Schema Part 0: Primer, 2nd ed. (Oct. 2004). W3C Recommendation.Google ScholarGoogle Scholar
  23. Wouter Gelade, Marc Gyssens, and Wim Martens. 2012. Regular expressions with counting: Weak versus strong determinism. SIAM J. Comput. 41, 1 (2012), 160--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wouter Gelade, Wim Martens, and Frank Neven. 2009. Optimizing schema languages for XML: Numerical constraints and interleaving. SIAM J. Comput. 38, 5 (2009), 2021--2043. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Giorgio Ghelli, Dario Colazzo, and Carlo Sartiani. 2007. Efficient inclusion for a class of XML types with interleaving and counting. In Proceedings of the 11th International Symposium on Database Programming Languages (DBPL’07), Vienna, Austria, September 23--24, 2007, Revised Selected Papers (Lecture Notes in Computer Science), Marcelo Arenas and Michael I. Schwartzbach (Eds.), Vol. 4797. Springer, 231--245. Google ScholarGoogle ScholarCross RefCross Ref
  26. Giorgio Ghelli, Dario Colazzo, and Carlo Sartiani. 2008. Linear time membership in a class of regular expressions with interleaving and counting. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). ACM, 389--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. V. M. Glushkov. 1961. The abstract theory of automata. Russian Math. Surveys 16, 5 (1961), 1.Google ScholarGoogle ScholarCross RefCross Ref
  28. Charles F. Goldfarb. 1990. SGML Handbook. Clarendon Press.Google ScholarGoogle Scholar
  29. Steve Harris and Andy Seaborne. 2013. SPARQL 1.1 Query Language. Technical Report. World Wide Web Consortium. W3C Recommendation.Google ScholarGoogle Scholar
  30. Dag Hovland. 2012. The membership problem for regular expressions with unordered concatenation and numerical constraints. In Proceeedings of the 6th International Conference on Language and Automata Theory and Applications (LATA’12), A Coruña, Spain, March 5--9, 2012 (Lecture Notes in Computer Science), Adrian Horia Dediu and Carlos Martín-Vide (Eds.), Vol. 7183. Springer, 313--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. E. Hopcroft and J. D. Ullman. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley.Google ScholarGoogle Scholar
  32. Pekka Kilpeläinen and Rauno Tuhkanen. 2003. Regular expressions with numerical occurrence indicators - Preliminary results. In Proceedings of the 8th Symposium on Programming Languages and Software Tools (SPLST’03), Pekka Kilpeläinen and Niina Päivinen (Eds.). University of Kuopio, Department of Computer Science, 163--173.Google ScholarGoogle Scholar
  33. Pekka Kilpeläinen and Rauno Tuhkanen. 2004. Towards efficient implementation of XML schema content models. In Proceedings of the 2004 ACM Symposium on Document Engineering, Milwaukee, WI, October 28--30, 2004. ACM, 239--241.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Leonid Libkin, Wim Martens, and Domagoj Vrgoc. 2016. Querying graphs with data. J. ACM 63, 2 (2016), 14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Anthony Mansfield. 1983. On the computational complexity of a merge recognition problem. Discrete Appl. Math. 5, 1 (1983), 119--122. Google ScholarGoogle ScholarCross RefCross Ref
  36. Alain J. Mayer and Larry J. Stockmeyer. 1994. Word problems—This time with interleaving. Inf. Comput. 115, 2 (1994), 293--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Anders Møller. 2010. dk.brics.automaton—Finite-State Automata and Regular Expressions for Java. Retrieved from http://www.brics.dk/automaton/.Google ScholarGoogle Scholar
  38. Manizheh Montazerian, Peter T. Wood, and Seyed R. Mousavi. 2007. XPath query satisfiability is in PTIME for real-world DTDs. In Proceedings of the XML Database Symposium (XSym’07) (Lecture Notes in Computer Science), Vol. 4704. Springer, 17--30. Google ScholarGoogle ScholarCross RefCross Ref
  39. Sushant Patnaik and Neil Immerman. 1997. Dyn-FO: A parallel, dynamic complexity class. J. Comput. Syst. Sci. 55, 2 (1997), 199--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Randy Smith, Cristian Estan, Somesh Jha, and Shijin Kong. 2008. Deflating the big bang: Fast and scalable deep packet inspection with extended finite automata. In Proceedings of the ACM SIGCOMM 2008 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. ACM, 207--218.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. M. Sperberg-McQueen. 2004. Notes on finite state automata with counters. Technical Report. Retrieved from http://www.w3.org/XML/2004/05/msm-cfa.html.Google ScholarGoogle Scholar
  42. C. M. Sperberg-McQueen. 2005. Applications of Brzozowski derivatives to XML schema processing. In Proceedings of the Extreme Markup Languages 2005 Conference.Google ScholarGoogle Scholar
  43. Henry S. Thompson, David Beech, Murray Maloney, and Noah Mendelsohn. 2004. XML Schema Part 1: Structures, 2nd ed. Technical Report. World Wide Web Consortium. W3C Recommendation.Google ScholarGoogle Scholar
  44. Manfred K. Warmuth and David Haussler. 1984. On the complexity of iterated shuffle. J. Comput. Syst. Sci. 28, 3 (1984), 345--358. Google ScholarGoogle ScholarCross RefCross Ref
  45. Peter T. Wood. 2003. Containment for XPath fragments under DTD constraints. In Proceedings of the 9th International Conference on Database Theory (ICDT’03). Springer, 297--311.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 42, Issue 4
          Invited Paper from SIGMOD 2016, Invited Paper from PODS 2016, Invited Paper from ICDT 2016 and Regular Papers
          December 2017
          241 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/3155316
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 November 2017
          • Accepted: 1 August 2017
          • Revised: 1 June 2017
          • Received: 1 September 2016
          Published in tods Volume 42, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader