当前位置: X-MOL 学术Big Data Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Parallelizing Computations of Full Disjunctions
Big Data Research ( IF 3.3 ) Pub Date : 2019-07-12 , DOI: 10.1016/j.bdr.2019.07.002
Matteo Paganelli , Domenico Beneventano , Francesco Guerra , Paolo Sottovia

In relational databases, the full disjunction operator is an associative extension of the full outerjoin to an arbitrary number of relations. Its goal is to maximize the information we can extract from a database by connecting all tables through all join paths. The use of full disjunctions has been envisaged in several scenarios, such as data integration, and knowledge extraction. One of the main limitations in its adoption in real business scenarios is the large time its computation requires. This paper overcomes this limitation by introducing a novel approach parafd, based on parallel computing techniques, for implementing the full disjunction operator in an exact and approximate version. Our proposal has been compared with state of the art algorithms, which have also been re-implemented for performing in parallel. The experiments show that the time performance outperforms existing approaches. Finally, we have experimented the full disjunction as a collection of documents indexed by a textual search engine. In this way, we provide a simple technique for performing keyword search over relational databases. The results obtained against a benchmark show high precision and recall levels even compared with the existing proposals.



中文翻译:

全析取的并行计算

在关系数据库中,完全分离运算符是完全外部联接对任意数量关系的关联扩展。它的目标是通过所有联接路径连接所有表,以最大化我们可以从数据库中提取的信息。在多种情况下,例如数据集成和知识提取,已经设想使用完全析取。在实际业务场景中采用它的主要限制之一是其计算需要大量时间。本文通过介绍一种新颖的方法克服这一限制基于并行计算技术,用于在精确和近似版本中实现完整的析取运算符。我们的建议已与最先进的算法进行比较,该算法也已重新实现以并行执行。实验表明,时间性能优于现有方法。最后,我们对完全分离进行了试验,将其作为文本搜索引擎索引的文档的集合。这样,我们提供了一种用于在关系数据库上执行关键字搜索的简单技术。与现有建议相比,根据基准获得的结果显示出较高的精度和召回率。

更新日期:2019-07-12
down
wechat
bug