The Complexity of Aggregates over Extractions by Regular Expressions,arXiv - CS - Formal Languages and Automata Theory

当前位置： X-MOL 学术 › arXiv.cs.FL › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Complexity of Aggregates over Extractions by Regular Expressions
arXiv - CS - Formal Languages and Automata Theory Pub Date : 2020-02-20 , DOI: arxiv-2002.08828
Johannes Doleschal, Noa Bratman, Benny Kimelfeld, Wim Martens

Regular expressions with capture variables, also known as "regex formulas," extract relations of spans (intervals identified by their start and end indices) from text. Based on these Fagin et al. introduced regular document spanners which are the closure of regex formulas under Relational Algebra. In this work, we study the computational complexity of querying text by aggregate functions, like sum, average or quantiles, on top of regular document spanners. To this end, we formally define aggregate functions over regular document spanners and analyze the computational complexity of exact and approximative computation of the aggregates. To be precise, we show that in a restricted case all aggregates can be computed in polynomial time. In general, however, even though exact computation is intractable, some aggregates can still be approximated with fully polynomial-time randomized approximation schemes (FPRAS).

中文翻译：

通过正则表达式进行聚合的复杂性

带有捕获变量的正则表达式，也称为“正则表达式”，从文本中提取跨度（由其开始和结束索引标识的间隔）的关系。基于这些 Fagin 等人。引入了常规文档生成器，它们是关系代数下正则表达式的闭包。在这项工作中，我们研究了在常规文档生成器之上通过聚合函数（如总和、平均值或分位数）查询文本的计算复杂性。为此，我们正式定义了常规文档生成器上的聚合函数，并分析了聚合的精确和近似计算的计算复杂性。准确地说，我们表明在受限情况下，所有聚合都可以在多项式时间内计算。然而，总的来说，即使精确计算是棘手的，

更新日期：2020-04-10

点击分享查看原文

点击收藏

阅读更多本刊最新论文