Inclusion algorithms for one-unambiguous regular expressions and their applications,Science of Computer Programming

当前位置： X-MOL 学术 › Sci. Comput. Program. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Inclusion algorithms for one-unambiguous regular expressions and their applications
Science of Computer Programming ( IF 1.5 ) Pub Date : 2020-03-10 , DOI: 10.1016/j.scico.2020.102436
Haiming Chen , Zhiwu Xu

One-unambiguous regular expressions are used in DTD and XML Schema. It is known that inclusion for one-unambiguous regular expressions is in PTIME. However, there has been few studies on algorithms for the inclusion. In this paper we present algorithms for checking inclusion of one-unambiguous regular expressions. A classical way is based on automata, following which one algorithm is provided and improvements are given. The other algorithm is based on derivatives, utilizing a property presented here that the number of derivatives of a one-unambiguous regular expression is finite. We have applied the algorithms to XML typechecking. The results of experiments with the algorithms are also included. First we give comparisons of the efficiency of our algorithms by experiments. Since after our work Hovland has given another algorithm, we also included his algorithm in the experiments. The results show that both of our algorithms are more efficient than Hovland's algorithm for one-unambiguous regular expressions, and under the inclusion mode (see Section 6) the derivative-based algorithm is more efficient than the automata-based one for small expressions, while for large expressions the latter is more efficient. Then we have conducted preliminary experiments by implementing typechecking of XML using the algorithms. The results show that typechecking using our algorithms is more efficient than typechecking using XDuce. Comparisons of the algorithms with CDuce are also given.

中文翻译：

一个明确的正则表达式的包含算法及其应用

在DTD和XML Schema中使用唯一的正则表达式。众所周知，在PTIME中包含一个明确的正则表达式。但是，关于包含算法的研究很少。在本文中，我们提出了一种用于检查一个明确的正则表达式是否包含的算法。一种经典的方法是基于自动机的，其后提供了一种算法并给出了改进。另一种算法是基于导数的，利用了此处介绍的一个明确的正则表达式的导数是有限的属性。我们已将算法应用于XML类型检查。该算法的实验结果也包括在内。首先，我们通过实验比较算法的效率。自从我们的工作之后，霍夫兰德给出了另一种算法，我们还在实验中包括了他的算法。结果表明，对于一种明确的正则表达式，我们的两种算法都比Hovland算法更有效，并且在包含模式下（请参见第6节），对于小表达式，基于导数的算法比基于自动机的算法更有效，而对于大型表达式，后者更为有效。然后，我们通过使用算法实现XML的类型检查进行了初步的实验。结果表明，使用我们的算法进行类型检查比使用XDuce进行类型检查更为有效。还给出了与CDuce算法的比较。在包含模式下（请参见第6节），对于小表达式，基于导数的算法比基于自动机的算法更有效，而对于大表达式，后者的效率更高。然后，我们通过使用算法实现XML的类型检查进行了初步的实验。结果表明，使用我们的算法进行类型检查比使用XDuce进行类型检查更为有效。还给出了与CDuce算法的比较。在包含模式下（请参见第6节），对于小表达式，基于导数的算法比基于自动机的算法更有效，而对于大表达式，后者的效率更高。然后，我们通过使用算法实现XML的类型检查进行了初步的实验。结果表明，使用我们的算法进行类型检查比使用XDuce进行类型检查更为有效。还给出了与CDuce算法的比较。

更新日期：2020-03-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11