当前位置: X-MOL 学术IEEE ACM Trans. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A De-Compositional Approach to Regular Expression Matching for Network Security
IEEE/ACM Transactions on Networking ( IF 3.0 ) Pub Date : 2019-10-28 , DOI: 10.1109/tnet.2019.2941920
Alex X. Liu , Eric Norige

Regular Expression (RegEx) matching is the industry standard for Deep Packet Inspection (DPI) because RegExes are significantly more expressive than strings. To achieve high matching speed, we need to convert the RegExes to Deterministic Finite State Automata (DFA). However, DFA has the state explosion problem, that is, the number of DFA states and transitions can be exponential with the number of RegExes. Much work has addressed the DFA state explosion problem; however, none has met all the requirements of fast and automated construction, small memory image, and high matching speed. In this paper, we propose a decompositional approach, with fast and automated construction, small memory image, and high matching speed, to DFA state explosion. The first key idea is to decompose a complex RegEx that cause exponential state increases into a set of simpler RegExes that do not cause exponential state increases, where any character string that matches the complex RegEx also matches all the RegExes in the set of simpler RegExes; that is, the set of strings that match the complex RegEx is a subset of strings that match the set of simpler RegExes. The second key idea is to use a stateful post-processing engine to filter the matches that are actually the matches of the complex RegEx. Given an input string for matching, instead of using the large DFA constructed from the original complex RegEx to perform the matching, we first use the small DFA constructed from the set of simpler RegExes to perform the matching, and then, if the small DFA reports a match, we use the post-processing engine to determine whether it is a true match to the original complex RegEx. Because the pre-processing is simple, automaton construction can be automated and fast, and because most on-line processing is done by a DFA, its matching speed is close to that of a DFA alone. Our experimental results show that our decompositional approach achieves orders of magnitude faster DFA construction (in terms of seconds instead of minutes), 30 times smaller memory image, and 43% faster matching speeds, than state-of-the-art software based RegEx matching algorithms.

中文翻译:

一种用于网络安全的正则表达式匹配的分解方法

正则表达式(RegEx)匹配是深度包检查(DPI)的行业标准,因为正则表达式比字符串具有更高的表现力。为了实现高匹配速度,我们需要将RegExes转换为确定性有限状态自动机(DFA)。但是,DFA存在状态爆炸问题,即DFA状态和转换的数量与RegExes的数量成指数关系。许多工作已经解决了DFA状态爆炸问题;但是,没有一个软件能够满足快速和自动构造,小内存映像和高匹配速度的所有要求。在本文中,我们提出了一种分解方法,该方法具有快速且自动化的构造,小内存映像和高匹配速度的DFA状态爆炸。第一个关键思想是将导致指数状态增加的复杂RegEx分解为一组不会导致指数状态增加的简单RegEx,其中,与复杂RegEx匹配的任何字符串也将与这组简单RegExes中的所有RegEx匹配;也就是说,与复杂RegEx匹配的字符串集是与较简单RegExes匹配的字符串的子集。第二个关键思想是使用有状态的后处理引擎来过滤匹配项,这些匹配项实际上是复杂RegEx的匹配项。给定用于匹配的输入字符串,我们首先使用从较简单的RegExes集合构造的小DFA进行匹配,而不是使用从原始复杂RegEx构造的大DFA进行匹配,然后,如果小DFA报告一场比赛 我们使用后处理引擎来确定它是否与原始的复杂RegEx真正匹配。因为预处理很简单,所以自动机的构建可以实现自动化和快速,并且由于大多数在线处理都是由DFA完成的,因此其匹配速度接近于单独的DFA。我们的实验结果表明,与基于最新软件的RegEx匹配相比,我们的分解方法可将DFA的构建速度提高数个数量级(以秒为单位,而不是分钟),将存储映像缩小30倍,并且将匹配速度提高43%算法。
更新日期:2020-01-04
down
wechat
bug